Question:
I have a two dimensional array. It contains some values and the percentage probability for their "loss". How to implement a function that will select a number from an array with the probability specified in it?
Answer:
Here is my solution. But instead of a two-dimensional array, I made an array of pairs, so it seems simpler and clearer, the input c_probs
contains pairs where the first value is the number being thrown out, the second is its probability, the probabilities may not be normalized, i.e. the sum does not give exactly 1. The solution is simply to calculate the distribution array, or, more simply, the cumulative probability, i.e. c_distr[i + 1] = c_probs[i].second + c_distr[i]
. Further, the probability of prob
from the interval [0, c_distr[last]]
is simply randomly thrown out and binary search (for simplicity, std::upper_bound is used) index i
is found for which c_distr[i - 1] <= prob < c_distr[i]
, as a result the resulting number thrown will be c_probs[i - 1].first
.
Here is the text of the C++ program, you can run it online :
#include <iostream>
#include <random>
#include <functional>
#include <vector>
#include <utility>
#include <algorithm>
using namespace std;
enum {
c_num_tests = 50,
};
int main() {
// Значение вероятностей чисел.
vector< pair<double, double> > const c_probs = {{5.5, 0.1}, {7.1, 0.3}, {1.3, 0.05},};
// Массив кумулятивных вероятностей или попросту распределение.
vector<double> c_distr(c_probs.size() + 1);
for (size_t i = 0; i < c_distr.size() - 1; ++i) c_distr[i + 1] = c_probs[i].second + c_distr[i];
// Стандартный генератор чисел.
std::random_device r_dev;
std::default_random_engine engine(r_dev());
std::uniform_real_distribution<double> distribution(0, c_distr.back());
auto rng = std::bind(distribution, engine);
// Генерируем точечные значения вероятности и определяем в какой индекс c_distr они попали.
for (size_t i_test = 0; i_test < c_num_tests; ++i_test) {
double prob = rng();
// Просто находит такой индекс i, что c_distr[i - 1] <= prob < c_distr[i].
int i = std::upper_bound(c_distr.begin(), c_distr.end(), prob) - c_distr.begin();
// Выводим просто число с индексом i - 1.
cout << c_probs[i - 1].first << " ";
}
cout << endl;
return 0;
}
As Vladimir Gamalyan suggested, you can use the ready-made std::discrete_distribution
class, it just does what I did manually above, here is a simpler solution with it, you can run it online :
#include <random>
#include <vector>
#include <utility>
#include <iostream>
#include <functional>
using namespace std;
enum {
c_num_tests = 50,
};
int main() {
// Значение вероятностей чисел.
vector< pair<double, double> > const c_probs = {{5.5, 0.1}, {7.1, 0.3}, {1.3, 0.05},};
// Сохраняем только веса.
vector<double> c_weights(c_probs.size());
for (size_t i = 0; i < c_probs.size(); ++i) c_weights[i] = c_probs[i].second;
// Стандартный генератор чисел.
std::random_device r_dev;
std::default_random_engine engine(r_dev());
std::discrete_distribution<size_t> distribution(c_weights.begin(), c_weights.end());
auto rng = std::bind(distribution, engine);
// Генерируем числа с заданными весами.
for (size_t i = 0; i < c_num_tests; ++i) {
cout << c_probs[rng()].first << " ";
}
cout << endl;
return 0;
}