algorithms – Random numbers with different probabilities

Question:

My question is not about a specific language but rather how to solve a problem.

How can I generate random numbers but with different probability that they come up, for example,

5 -> 63%

10 -> 29.99%

25 -> 7%

50 -> 0.01%

80 -> 0.0001%

The percentages are taken out of a game as an example nothing more.

Answer:

A fairly generic method would be to have in a data structure (for example in two separate lists) the possible numbers to be generated and the possible "relative weights" of each one, for example the probability in percent that they will come out.

From the list of weights we obtain a list of probabilities. We must ensure that the probabilities add up to 1, so in the list that contains these weights we could divide each element by the sum of all of them.

From the list of probabilities we build another list of "accumulated" probabilities, which are, for each element, its probability plus the sum of all the previous probabilities.

For example, if the input lists are:

datos = [ 5, 10, 25, 50, 80 ]
pesos = [ 63, 29.99, 7, 0.01, 0.0001 ]

that of "readjusted" weights would be:

probabilidades = [0.6299994, 0.2998997, 0.0699999, 0.0001000, 0.0000010]

and the cumulative probability would be:

acumuladas = [0.62999937, 0.92989907, 0.999899  , 0.999999  , 1.]

So, once we have the latter, the algorithm would be:

  • Generate a random number between 0 and 1 (ex: suppose 0.8543 is output)
  • Go through the acumuladas array and find the index of the first element that exceeds the number obtained in the previous section (in the example, the first number that meets it is 0.92989907, whose index is 1)
  • Select the datos that has that index from the list (continuing with the example, the one selected would be datos[1] which is 10.

If the first step generates the random numbers uniformly distributed between 0 and 1, this algorithm will select the datos numbers with the specified probabilities.

Extension

In case it is useful to anyone, and to compare with other languages, here is a possible implementation in Python:

from itertools import accumulate
import random

def elige_con_probabilidad(datos, pesos):
  suma = sum(pesos)
  pesos = accumulate(peso/suma for peso in pesos)
  r = random.random()
  for i, p in enumerate(pesos):
    if p > r:
      break
  return datos[i]

To see how it goes, I generate 10,000 random numbers with a given distribution and count how many times each one is repeated:

from collections import Counter

datos = [25, 50, 75, 100]
pesos = [0.6, 0.3, 0.07, 0.03]
muchos = [ elige_con_probabilidad(datos, pesos) for _ in range(10000) ]
print(Counter(muchos))

Counter({25: 5977, 50: 3002, 75: 707, 100: 314})
Scroll to Top