2012. március 25., vasárnap

Generating a Random Combination

Recently I've had my mind on lottery drawing, and I suddenly thought of a way to generate a random combination that is better than how I used to do it.

Here's the basic problem. Given the set of numbers (0, 1, ... n-1), we want to generate a random subset of k elements of this set of n. The random generator function that I can use is Random.nextInt(n), which returns a number between 0 and n-1. Now, if we have already generated some of the numbers, (c[0], c[1], ..., c[i-1]), then how do we generate the next one?

The way I traditionally do it is that I keep generating a random number between 0 and n-1 until I get one which is different from the ones I already have:

List<Integer> numbers=new ArrayList<Integer>();
while (numbers.size()<k) {
    int next=rnd.nextInt(n);
    if (!numbers.contains(next))
        numbers.add(next);
}
I've kept wondering if there if a way to do it that doesn't involve an unbounded loop, and there is. Think of how lottery numbers are drawn. Once a number is selected, it is discarded from the set of possible numbers, and the next one is then chosen uniformly from the rest. We can do something quite similar in code!

Once again, suppose that we have already generated i numbers, and we're looking for the (i+1)-th one. There are (n-i) candidates left to choose from, the only problem is that they're not the numbers from 0 to n-i-1 (which is the interval in which we can generate one of (n-i) numbers). But we can map these two sets! Let me demonstrate on an example. Consider the numbers between 0 and 9, and suppose we have already chosen 3, 4, and 7. Here's our problem:

The numbers we need to choose from: 0 1 2     5 6   8 9
The numbers we can generate from: 0 1 2 3 4 5 6

Notice that we can collapse the first row, giving us a mapping from the 7 numbers between 0 and 6 to the 7 numbers between 0 and 9 but excluding the ones we've already chosen. The question is: if we generate one in the former set, how can we get the actual number it corresponds to? Here's how:

List<Integer> numbers=new ArrayList<Integer>();
for (int i=0; i<k; i++) {
    int next=rnd.nextInt(n-i);
    int nextIndex;
    for (nextIndex=0;
nextIndex<i && numbers.get(nextIndex)<=next; nextIndex++)
        next++;
    numbers.add(nextIndex, next);
}
Note that now we maintain the list of selected numbers in ascending order. What we basically do is shift our generated number to the right (increment it) as many times as we need to account for the 'empty slots'. For example, suppose that in the scenario depicted above, we get the random number 3. Since the first number already selected (which is 3) is <= the new number, we increment it to get 4. The next number already selected (4) is once again <= our number, so we increment it again, and get 5. The next number in the list is 7, so we don't increment any more. We've found the number among the remaining ones which corresponds to the generated number 3 (or, to put it in another way, the fourth one among the remaining candidates): 5. We similarly need two increments if we start from the number 4. If the generator gives us 5, then after two increments, we find that the next already chosen number, 7, causes us to increment a third time, giving us 8. And so on.

As a nice side-effect, we can notice that the number of times we needed to increment gives us the index where we have to insert the new number to keep the list sorted (since it is equal to how many numbers we already have which are less than this new one).

I like this algorithm much more than the original one. We can give an upper limit to its execution time, so we don't run into problems even when k is close to n. I'm pretty sure that I'm not the first one to think of this way of generating combinations, but I wonder how come I've never seen this method before.

P.S. There's another, quite pretty approach on Wikipedia. That one is similarly easy to implement, has bound run-time, and I've also not heard of that one before.