Copyright 2017-2018 Jason Ross, All Rights Reserved

Star InactiveStar InactiveStar InactiveStar InactiveStar Inactive
 
Image courtesy of Jan Willem Geertsma, rgbstock.com

It’s easy enough to choose an item at random from a collection, if they all have the same probability of being chosen. When each of them has its own probability though, things get a little more complicated.

For example, if you wanted to choose an item from the following list, with the odds given, what’s the best way to approach it?

Item Odds
A 5%
B 5%
C 10%
D 15%
E 30%
F 12%
G 23%

This might seem a strange thing to want to do, but looking into Markov Chains (like these: Simulating The Future – Part 1: A Brief Introduction to Markov Chains - what a coincidence!) and other simulations, it’s actually quite a handy thing to be able to do. There are several ways to do it, but this method seems to be fairly straightforward.

First calculate the cumulative odds for the items in the collection:

Item Odds Cumulative
A 5% 0.05
B 5% 0.10
C 10% 0.20
D 15% 0.35
E 30% 0.65
F 12% 0.72
G 23% 1.00

Generate a random number between 0 and the total of all of the odds, search the collection to find the first item where the cumulative odds are greater than the random number.

Examples

  • If the random number is 0.351, the first cumulative value greater than this is 0.65, which gives us item E.
  • If the random number is 0.0498, the first cumulative value greater than this is 0.05, which gives us item A.
  • If the random number is 0.733, the first cumulative value greater than this is 1.00, which gives us item G.

We can work this out using a simple loop to iterate through the collection, and then break out of the loop when we find the number we’re looking for. However that’s not a very efficient way to do this, or anything else for that matter, as it’s an O(n) operation. There is a better way to find the position we want within the collection, and that’s to use the bisect.bisect_left method in Python, and BinarySearch methods in C#. Both of these are O(log n) operations, so they’re almost certainly the best way to go – using them isn’t going to cost you anything so there’s no disadvantage.

Notes:

There’s no need to sort anything to make it work with the bisect.bisect_left or BinarySearch – because they’re working on the cumulative odds, and those are already in ascending order.

If you’re working with occurrences rather than explicit probabilities, that’s not a problem. Just use the cumulative number of occurrences, and then generate a random number between 0 and the total number of occurrences. For example:

Item Occurrences Cumulative
A 6 6
B 56 62
C 17 79
D 6 85
E 23 108
F 46 154
G 6 161

In this case, the random number would be in the range 0 to 161. The rest of the method is the same, using bisect.bisect_left or BinarySearch to calculate the selected item from the collection.

Code Examples

As a demonstration of this method here are two examples of its implementation.

Python

from bisect import bisect_left
from itertools import accumulate
from random import random

# Our original set of items, and the odds of each of them occurring
items_with_odds = {'a': 0.05, 'b': 0.05, 'c': 0.10, 'd': 0.15, 'e': 0.30, 'f': 0.12, 'g': 0.23}

# Calculate the cumulative odds of the items in the  items_with_odds collection
cumulative_odds = list(accumulate([odds for odds in list(items_with_odds.values())]))

# The upper limit of the random number to choose an individual item
total_odds = cumulative_odds[-1]

# Initialize a dictionary where we can store the results. Quicker and faster to
# create it all now, rather than check each key's existence every time we update it.
results = dict.fromkeys(items_with_odds.keys(), 0)

# Carry out a statistically valid number of tests
for count in range(0, 1000000):
    # Choose a value in [0, total_odds)
    value = random() * total_odds

    # Find the index of the smallest cumulative odds value >= the chosen value
    i = bisect_left(cumulative_odds, value)

    # Find the key at the selected index
    selected_key = list(items_with_odds.keys())[i]

    # Update the number of selections made for the item
    results[selected_key] = results[selected_key] + 1

# Display the items, and the number of times each were selected
for key in sorted(results.keys()):
    print('{}: {}'.format(key, results[key]))

C# In .Net Core

Just in case you want a fast implementation...

using System;
using System.Collections.Generic;
using System.Linq;

namespace ChoosingRandomItemsByOddsCS {

    public static class Extensions {
        public static IEnumerable<double> CumulativeSum(this IEnumerable<double> sequence) {
            double total = 0.0;

            foreach (var item in sequence) {
                total += item;
                yield return total;
            }
        }
    }

    class Program {
        static void Main(string[] args) {
            // Our original set of items, and the odds of each of them occurring
            IReadOnlyDictionary<string, double> itemsWithOdds = new Dictionary<string, double> { { "a", 0.05 },
                { "b", 0.05 },
                { "c", 0.10 },
                { "d", 0.15 },
                { "e", 0.30 },
                { "f", 0.12 },
                { "g", 0.23 },
            };

            // Calculate the cumulative odds of the items in the itemsWithOdds collection
            List<double> cumulativeOdds = new List<double>(itemsWithOdds.Values.CumulativeSum());

            // The upper limit of the random number to choose an individual item
            var totalOdds = cumulativeOdds.Last();

            // Initialize a dictionary where we can store the results. Quicker and faster to
            // create it all now, rather than check each key's existence every time we update it.
            IDictionary<string, int> results = itemsWithOdds.Keys.ToDictionary(k => k, k => 0);

            Random random = new Random();

            // Carry out a statistically valid number of tests
            foreach (int i in Enumerable.Range(0, 1000000)) {
                // Choose a value in [0, total_odds)
                var value = random.NextDouble() * totalOdds;

                // Find the index of the smallest cumulative odds value >= the chosen value
                var index = cumulativeOdds.BinarySearch(value);
                index = index < 0 ? ~index : index;

                // Find the key at the selected index
                var selectedKey = itemsWithOdds.Keys.ToList() [index];

                // Update the number of selections made for the item
                results[selectedKey] += 1;
            }

            Console.WriteLine(string.Join(", ", results));
        }
    }
}