# Why look back at basic probability?

Bayes' theorem lies at the heart of much of modern machine learning. Although it's relatively simple to understand, you do need some grounding in probability theory. This blog post is all about getting you up close and personal with probability theory so I can tell you all about Bayes in a later post.

(You can work out the probability aliens are on earth given that Elvis lives. Image source: Pixabay Author: Pete Linforth License: Pixabay.)

# The very basics

Think of some event that might occur in the future, say winning the lottery, buying a new car, or England winning the World Cup. We can estimate the probability of these events happening; we can call the event A and the probability of the event occurring P(A). If the event is certain to occur, then P(A) =1, if it's certain not to occur, then P(A) = 0, and in all cases: 0 \(\leq \) P(A) \(\leq \) 1.

We'll consider the probability of several events I'm going to call A, B, C, etc. These can be any events at all, including aliens landing, Elvis making a comeback, or getting a pay raise at the end of the year.

# The complementary rule

If the probability of an event A occurring is P(A), the probability of it *not* occurring is \(1 - P(A)\). This is called the complement and different authors use different notation for it:

Let me give you an example using one notation. Imagine 1% of the population has a disease and 99% don't, then:

# Independence

Independence is a huge issue in probability modeling and it can lead to big errors if not handled correctly. On the face of it, it's a simple idea, but there are subtleties.

Two events are independent if one does not affect or influence the other in any way (alternatively, one event does not give any information about the other). For example, the odds of Joe Biden winning the 2020 Presidential election do not depend on the odds of New Zealand opening its borders to international travelers. Looking at things the other way, the odds of me winning the lottery are dependent on my purchasing a ticket (I have to buy a ticket to stand any chance of winning) - these are dependent events. I'm sure you can think of many other examples.

Independent and dependent events are treated very differently mathematically, the big mistake comes when events that are *not independent* are considered to be *independent*. For example, an organization might run many opinion polls in an election. The errors in the polls will *not* be independent of one another because the organization may well have a systemic bias that affects *all* their polls. There are similar problems in epidemiology; if you and I live together, my probability of catching an infectious disease is not independent of your probability of catching an infectious disease. The most famous example of confusing independent and dependent events was the subprime mortgage scandals of 2008 onwards. The analysts who developed the subprime mortgage default models assumed that mortgage defaults were independent of one another. Unfortunately for all of us, that wasn't the case in 2008. Economic conditions led to many defaults, which in turn led to broader financial problems, which in turn led to more defaults. In 2008 and onwards, sub-prime mortgage defaults were dependent on one another.

# Disjoint (mutually exclusive) events

Two events are disjoint if they're mutually exclusive, in other words, if both can't happen. For example, only one of Joe Biden or Donald Trump can win the election - they both can't be President. In notation I'll explain later: \( P(A \ and \ B) = P(A \cap B) = 0\).

# Probability A and B occurring (intersection) - the multiplication rule

What's the probability of A *and* B occurring (also known as their joint or conjoint probability)? Here's where we run into some notation issues. Some sources write 'and' and some use the symbol '\(\cap\)' - both mean the same thing.

Here's the rule for *dependent* events:

Here's the rule for *independent* events:

Here's the rule for disjoint events:

The and relationship is commutative:

# Probability of A or B occurring (union) - the addition rule

What's the probability of A *or* B occurring? Some sources write 'or' and some write '\(\cup\)'. Here's the rule:

\[P(A \ or \ B) = P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

\[= P(A) + P(B) - P(A)P(B | A)\]

The or relationship is commutative:

For disjoint events, the addition rule simplifies to:

\[P(A \ and \ B) = P(A \cup B) = P(A) + P(B) \]

because from before we have:

\[P(A \cap B) = 0\]

# Conditional probability - the conditional rule

What's the probability I have a disease given I've tested positive for the disease? We use the | symbol to mean "given that", so P(A | B) means the probability of A happening given that B has occurred. Here are some examples from everyday life:

- What's the probability I win the lottery given that I've bought a ticket?
- What's the probability I will get a degree if I go to college?
- What's the probability I will have an accident if I'm driving and if it's snowing and if it's dark?

The interesting thing about conditional probability is that it can be quite different from the 'raw' probability. For example, let's say you're from a poor family, you might only have a 10% chance of getting a degree, but if you get accepted to a college, the probability might shoot up to 50%, and if you actually go to college, the probability may get to 95%. The probability can change quite substantially depending on new information (as we'll see with Bayes' theorem).

The general rule is:

If A and B are independent (A does not depend on B), then P(A | B) = P(A).

# The law of total probability

There's a general form of this law and a more specific form. Because the specific form will be useful for Bayesian work later, we'll start with that.

In words, the probability of an event A+ occurring is the probability of the event A+ occurring and the event B+ occurring *plus* the probability of event A+ occurring and the probability of event B+ not occurring (B-). This might be clearer if we remember \(1 = P(B+) + P(B-)\) and we think of probabilities using a Venn diagram.

The more general form of this law is:

# The law of total probability and conditional probabilities

One of the most useful forms of Bayes' theorem relies on the combination of the law of total probability and conditional probability. Here's the key relationship:

Let me put this into words. If event B happens, then either A or not A happens, there are no other options, so the two probabilities must sum to 1.

# What use is probability theory?

I grew up hearing about the value of 'common sense', but probability theory often gives results that seem very counterintuitive and 'common sense' can lead you wildly astray. A fun example is the Monty Hall problem, but there are lots of other examples in the real world where the probability of something happening is not what it appears to be at first - and they're not so fun. The counter-intuitive example you find most often on the internet is the probability that you have a disease given a positive test result; it's mostly not what you think.

Bayes' theorem takes us into the world of the counter-intuitive and I'll talk about Bayes in a future blog post.