It’s easy enough to choose an item at random from a collection, if they all have the same probability of being chosen. When each of them has its own probability though, things get a little more complicated.

For example, if you wanted to choose an item from the following list, with the odds given, what’s the best way to approach it?

Item | Odds |
---|---|

A | 5% |

B | 5% |

C | 10% |

D | 15% |

E | 30% |

F | 12% |

G | 23% |

This might seem a strange thing to want to do, but looking into Markov Chains (like these: Simulating The Future – Part 1: A Brief Introduction to Markov Chains - what a coincidence!) and other simulations, it’s actually quite a handy thing to be able to do. There are several ways to do it, but this method seems to be fairly straightforward.

First calculate the cumulative odds for the items in the collection:

Item | Odds | Cumulative |
---|---|---|

A | 5% | 0.05 |

B | 5% | 0.10 |

C | 10% | 0.20 |

D | 15% | 0.35 |

E | 30% | 0.65 |

F | 12% | 0.72 |

G | 23% | 1.00 |

Generate a random number between 0 and the total of all of the odds, search the collection to find the first item where the cumulative odds are greater than the random number.

## Examples

- If the random number is 0.351, the first cumulative value greater than this is 0.65, which gives us item E.
- If the random number is 0.0498, the first cumulative value greater than this is 0.05, which gives us item A.
- If the random number is 0.733, the first cumulative value greater than this is 1.00, which gives us item G.

We can work this out using a simple loop to iterate through the collection, and then break out of the loop when we find the number we’re looking for. However that’s not a very efficient way to do this, or anything else for that matter, as it’s an O(n) operation. There is a better way to find the position we want within the collection, and that’s to use the `bisect.bisect_left`

method in Python, and `BinarySearch`

methods in C#. Both of these are O(log n) operations, so they’re almost certainly the best way to go – using them isn’t going to cost you anything so there’s no disadvantage.

### Notes:

There’s no need to sort anything to make it work with the `bisect.bisect_left`

or `BinarySearch`

– because they’re working on the cumulative odds, and those are already in ascending order.

If you’re working with occurrences rather than explicit probabilities, that’s not a problem. Just use the cumulative number of occurrences, and then generate a random number between 0 and the total number of occurrences. For example:

Item | Occurrences | Cumulative |
---|---|---|

A | 6 | 6 |

B | 56 | 62 |

C | 17 | 79 |

D | 6 | 85 |

E | 23 | 108 |

F | 46 | 154 |

G | 6 | 161 |

In this case, the random number would be in the range 0 to 161. The rest of the method is the same, using `bisect.bisect_left`

or` BinarySearch`

to calculate the selected item from the collection.

## Code Examples

As a demonstration of this method here are two examples of its implementation.

### Python```
```

`from bisect import bisect_left`

from itertools import accumulate

from random import random

# Our original set of items, and the odds of each of them occurring

items_with_odds = {'a': 0.05, 'b': 0.05, 'c': 0.10, 'd': 0.15, 'e': 0.30, 'f': 0.12, 'g': 0.23}

# Calculate the cumulative odds of the items in the items_with_odds collection

cumulative_odds = list(accumulate([odds for odds in list(items_with_odds.values())]))

# The upper limit of the random number to choose an individual item

total_odds = cumulative_odds[-1]

# Initialize a dictionary where we can store the results. Quicker and faster to

# create it all now, rather than check each key's existence every time we update it.

results = dict.fromkeys(items_with_odds.keys(), 0)

# Carry out a statistically valid number of tests

for count in range(0, 1000000):

# Choose a value in [0, total_odds)

value = random() * total_odds

# Find the index of the smallest cumulative odds value >= the chosen value

i = bisect_left(cumulative_odds, value)

# Find the key at the selected index

selected_key = list(items_with_odds.keys())[i]

# Update the number of selections made for the item

results[selected_key] = results[selected_key] + 1

# Display the items, and the number of times each were selected

for key in sorted(results.keys()):

print('{}: {}'.format(key, results[key]))

### C# In .Net Core

Just in case you want a fast implementation...```
```

`using System;`

using System.Collections.Generic;

using System.Linq;

namespace ChoosingRandomItemsByOddsCS {

public static class Extensions {

public static IEnumerable<double> CumulativeSum(this IEnumerable<double> sequence) {

double total = 0.0;

foreach (var item in sequence) {

total += item;

yield return total;

}

}

}

class Program {

static void Main(string[] args) {

// Our original set of items, and the odds of each of them occurring

IReadOnlyDictionary<string, double> itemsWithOdds = new Dictionary<string, double> { { "a", 0.05 },

{ "b", 0.05 },

{ "c", 0.10 },

{ "d", 0.15 },

{ "e", 0.30 },

{ "f", 0.12 },

{ "g", 0.23 },

};

// Calculate the cumulative odds of the items in the itemsWithOdds collection

List<double> cumulativeOdds = new List<double>(itemsWithOdds.Values.CumulativeSum());

// The upper limit of the random number to choose an individual item

var totalOdds = cumulativeOdds.Last();

// Initialize a dictionary where we can store the results. Quicker and faster to

// create it all now, rather than check each key's existence every time we update it.

IDictionary<string, int> results = itemsWithOdds.Keys.ToDictionary(k => k, k => 0);

Random random = new Random();

// Carry out a statistically valid number of tests

foreach (int i in Enumerable.Range(0, 1000000)) {

// Choose a value in [0, total_odds)

var value = random.NextDouble() * totalOdds;

// Find the index of the smallest cumulative odds value >= the chosen value

var index = cumulativeOdds.BinarySearch(value);

index = index < 0 ? ~index : index;

// Find the key at the selected index

var selectedKey = itemsWithOdds.Keys.ToList() [index];

// Update the number of selections made for the item

results[selectedKey] += 1;

}

Console.WriteLine(string.Join(", ", results));

}

}

}

## This All Looks a Little Familiar Somehow...

Yes, it does. That's because this functionality is implemented in the Python method:

`random.choices()`

from Python 3.6 onwards. Alternatively the same functionality is available in NumPy using the

`numpy.random.choice()`

method.