Introduction to Binomial Distributions.

Bernoulli Trials and Binomial Distributions.

Sometimes we encounter problems like this:

"Something has a probability $p$ of happening and a probability $1-p$ of not happening."

These types of problems are called Bernoulli Trials; they are events which can either happen ("succeed") or not happen ("fail") and nothing inbetween. There is usually a given probability that such an event succeeds, generally denoted $p$, and the probability that the event fails is $1 - p$, which is sometimes denoted by $q$.

Example 1. Suppose that today there is a $20\%$ chance it will rain. This forms a Bernoulli trial: it will either rain or not rain. In this case, $p = 0.2$, or $20\%$ and $q = 1-p = 0.8$ or $80\%$.

This is pretty boring, though. Let's generalize this a bit: let's say that there is some Bernoulli trial $X$ (like the one in the example above) and let's say that we run the trial $n$ times. What is the probability that it will succeed exactly $k$ times?

Well, if we want the trial to succeed exactly k times, we could have the sequence: \[\underbrace{\mbox{Succeed},\mbox{Succeed},\dots, \mbox{Succeed}}_{k\mbox{ times.}}, \underbrace{\mbox{Fail},\mbox{Fail},\dots,\mbox{Fail}}_{n-k\mbox{ times.}}\] where the $n-k$ comes from the fact that there are $n-k$ times the trial must fail if it succeeds exactly $k$ times.

What's the probability of this happening? The first succeeds, so that's $p$; the second succeeds, so that's $p$; and so on. This, altogether, gives us a total probability of \[\underbrace{p\cdot p \cdots p}_{k\mbox{ times.}}\cdot \underbrace{(1-p)\cdot (1-p)\cdots (1-p)}_{n-k\mbox{ times.}}.\] which is equivalent to \[p^{k}(1-p)^{n-k}\] which is a nice little number. On the other hand, it could also be the case that the trials went like this: \[\mbox{Succeed},\mbox{Fail},\mbox{Succeed},\dots\] instead of all of the succeeding trials coming first. Hm. Well, that's okay. This has the same probability of happening (since it's still $p^{k}(1-p)^{n-k}$) so let's just count all the possible orderings where exactly $k$ successes happen.

We can think of this as having a total of $n$ "boxes" and we need to put $k$ successes in some of these boxes, one per box. The number of different ways to place $k$ identical things (in this case, successes) into $n$ boxes is given by \[\binom{n}{k} = \frac{n!}{k!(n-k)!}.\] This is the number of possible orderings where there are exactly $k$ successes in $n$ trials. Neat. Since each of them have probability $p^{k}(1-p)^{n-k}$ we have that the total probability of the whole thing is given by:

Binomial Distribution. Let $X$ be a Bernoulli trial with probability of success $p$. If the trial is done $n$ times, then the probability of exactly $k$ successes is given by \[P(X = k) = \binom{n}{k}p^{k}(1-p)^{n-k}.\]

"At Most" and "At Least" Problems.

We don't need to talk about "exactly $k$ successes" all the time, we can generalize a bit and talk about "at least" and "at most" $k$ successes.

Suppose we have a trial $X$ which has a probability of success $p$ and we do it $n$ times. Suppose we want at least 3 successes; what does this mean? It means we'll accept 3 successes, 4 successes, 5 successes, etc. But we know how to find the probability for exactly 3 successes, exactly 4 successes, and so on! In particular, we'd just add together \[P(X=3) + P(X=4) + P(X=5) + \cdots + P(X=n).\] Not so nice. There's a lot of terms to add up. But there's a simple way to fix this: we note that, in general, we have that \[\begin{align*}P(X = 0) &+ P(X = 1) + P(X=2) + P(X=3) \\&+ \cdots + P(X=n) = 1\end{align*}\] meaning that if we sum up all of the probabilities of all possible outcomes (the event can happen 0 times, 1 time, 2 times, etc.) then we should get exactly 1 (or $100\%$). Think about our "at least 3 successes" now: what if were to calculate only $P(X=0) + P(X=1) + P(X=2)$ instead? Then the sum of the rest (which is what our "at least 3 successes" is!) will give us 1. Let's do an example of this real quick.

Example 2. Suppose there is a $20\%$ chance that the dachshund you're babysitting will go to the bathroom on your floor during the day (so $p = 0.2$). Suppose you babysit the dachshund for 30 days. What's the probability it will go to the bathroom on your floor at least twice?

Solution. We want \[\begin{align*}P(X\geq 2)&= P(X=2) + P(X=3) \\&+ \cdots + P(X=30)\end{align*}.\] That is a lot of terms. Instead, let's first compute "what is missing" from our sum above, $P(X=0) + P(X=1)$, then we'll subtract that from 1 to get $P(X\geq 2)$, which is what we want. We note \[\begin{align*}P(X=0) &= \binom{30}{0}(0.2)^{0}(0.8)^{30} \approx 0.0012\\ P(X=1) &= \binom{30}{1}(0.2)^{1}(0.8)^{29} \approx 0.0093\end{align*}\] and so $P(X = 0) + P(X=1) = 0.0012 + 0.0093 = 0.0105$. That means "the rest" (which is $P(X\geq 2)$) is equal to $1 - 0.0105 = 0.895$. Hence, $P(X\geq 2) = 0.895$. In other words, it's pretty likely that the dachshund will go to the bathroom inside the house at least twice. This makes sense.

"At most" problems work the same way. If we wanted to say, "this thing happens at most two times" then this is the same as I saying, $P(X=0) + P(X=1) + P(X=2)$. We'll do one more example question, just for kicks.

Example 3. Suppose there is a $50\%$ of you passing each of your classes, and suppose you're taking 4 classes. What's the probability you pass at most three classes?

Solution. We have $p = 0.5$ for "passing a class" (which we will call X). We want \[P(X\leq 3) = P(X = 0) + P(X=1) + P(X=2) + P(X=3).\] Since "the rest" is only $P(X = 4)$ (since you're only taking 4 classes), we'll use the trick above to find it and then subtract it from 1. So, \[P(X=4) = \binom{4}{4}(.5)^{4}(1 - 0.5)^{0} = 0.0625\] which gives us that \[\begin{align*}P(X\leq 3) &= P(X=0) + P(X=1) + P(X=2) + P(X=3)\\ &= 1 - P(X=4) = 1 - 0.0625 = 0.9375.\end{align*}\] That is, there's a $93.75\%$ chance of you failing at least one class. Good job. Note that we could have also been not-so-lazy and just found and added up $P(X=0) + P(X=1) + P(X=2) + P(X=3)$.