Thursday, 14 August 2014

Jellybeans

My partner asked me to estimate the probability that a packet of jelly beans would be missing a particular flavour. I enjoy messing around with problems like these, so I thought I'd give it a shot. If there is anything wrong, or if there is a better method of solving the problem, I'd love to hear it. Also, I'm not really a mathematician at all, so I won't attempt to sketch out a formal proof. I'll describe my thinking, and the empirical observations I made along the way.

So, first off, we had to estimate the number of jelly beans in the packet. The packaging informed us that the total weight was 225g. We found that 24 jelly beans weighed 24.5g, so extrapolating from that we estimated there were 230 beans in the packet.

The packet also claimed it had 9 flavours of beans. We assumed that beans were sampled randomly from 9 independent sources (imagine a robot that selects from 230 beans from 9 jars, where each jar contains a different flavour, and each choice does not depend on any of the previous choices)

The problem could be interpreted in two ways: either that there are not 9 flavours of beans in the jar, or that there are exactly 8 flavours of beans. The following method allows us to solve either case.

From the assumptions outlined, any combination of counts of beans that sum up to 230 are equally likely, including those with the zero elements. As such, we could separate out the strings of numbers by the count k of non zero elements. By counting the possible strings that sum up to 230 for each k, we could work out the probability of each string with n non-zero elements occurring.

Due to combinatorial nature of this problem, I wasn't going to sit down and try to count each possible string. I noticed however, that the counts for each k were following a particular pattern. I would take a string with a starting point  of (230-k + 1) and (k-1) ones, e.g., 228 + 1 + 1 for k=3. Starting from the left, I would subtract 1 from the first column, add 1 to the next and leave all others the same. This would continue until the first column had reached 1, and the second column was (230 - k + 1). For k > 2, I would then add one to the third column, and subtract 1 from the second, then return to adding and subtracting from the first and second columns. 

I noticed this procedure would follow a pattern of polyhedral numbers, where the set size C for strings with k non-zero elements:

Ck = prod(i=[1:(k-1)]) {230 - k+ i}/(k-1)!, k < 9 (since there are 9 possible jelly bean flavours.

(Generalising this: replacing 230 with N for any number of possible jelly beans, and setting k < r for any number of possible jelly bean flavours- obviously r < N)

Now, considering the placement of zeros within the string, it is necessary to work out the number of combinations of non-zero elements in the string. These can be calculated as binomial coefficients of the form 

Bk = 9!/(k!(9-k)!)

So, to work out the probability of there not being 9 flavours of jelly beans we must compute

P(not 9 flavours of jelly beans)  = 1 - C9/sum_k=1:9(Bk*Ck) = 0.269

and to work out the probability of there being exactly 8 flavours of jelly beans we must compute

P(exactly 8 flavours of jelly beans from a possible 9) = 9*C8/sum_9(Ck) = 0.237

On first reflection, this seems unintuitively high. However, with 9 options to choose from, and no reason to move from one sack of bean flavours to the next, with a relatively small size of packet, it might be quite reasonable that there is a reasonably high chance it might never visit 1 of the sacks.

I guess that Jelly Bean Inc. don't use this particular method to sort their beans.

No comments:

Post a Comment