Thursday, August 28, 2025

The sisters "paradox" - counter-intuitive probability

It seems simple, but it isn't

There are a couple of famous counter-intuitive problems in probability theory and the sisters "paradox" is one of them. I'll tell you the problem, let you guess the solution, and then give you some of the background.

Here's the problem: a family has two children. You're told that at least one of them is a girl. What's the probability both are girls?

(International Film Service / American Releasing Co., Public domain, via Wikimedia Commons)

Assume that the probability of having a girl or boy is 50% and that the birth order has no effect on the probability. Assume the family is selected at random because they have at least one girl.

What do you think the probability is that both children are girls?

A simpler question

Let's image you're asked a simpler question.

A family has two children. What's the probability both are girls?

We can work this out using a simple probability tree:

Boy (0.5)                                                Girl (0.5)

/              \                                               /                \

Boy-Boy (0.25)       Boy-Girl (0.25)            Girl-Boy (0.25)      Girl-Girl (0.25) 

So the probability of two girls is 0.25.

Note there are two ways of having a boy and a girl, so the total probability of having a boy and a girl (in any order) is 0.5.

The wrong answer

Let's go back to the original problem and see the logic behind the most-often given wrong answer.

The birth chance is 0.5 boy and 0.5 girl. We don't know the gender of one of the children, but it must be a 0.5 probability it's a girl. Given the fact we already know one of the children is a girl, the probability of their being two girls must therefore be 0.5.

It sounds right because it sounds logical, but it isn't right for reasons as I'll explain next.

The correct answer

The correct answer is 1/3. Let's see why.

In the probability tree above, we can see four equally likely combinations: {Boy-Boy} (0.25), {Boy-Girl} (0.25), {Girl-Boy} (0.25), and {Girl-Girl} (0.25). We're told in the problem that the {Boy-Boy} combination is ruled out, which leaves us with three remaining combinations. Each of these three remaining combinations is equally likely and it has to be one of them, which means the probability of two girls is 1/3.

There are two ways of having a boy and a girl, {Boy-Girl} and {Girl-Boy}, which means there's a 2/3 probability of having a boy and a girl (in any order). The mistake is to consider that a 0.5 probability.

Sample space

The underlying method to solve this problem is to use something called the 'sample space' which is the set of all possible outcomes of a trial. In our case, the set of all outcomes is {{Boy-Girl}, {Girl-Boy}, {Girl-Girl}}. We can associate probabilities with each of the elements of our sample space. In our case, they're all 1/3.

The sample space idea helps us solve various versions of the problem, here's an example. If we're told the eldest child is a girl, does this change anything? Actually, it does. The sample space becomes {Boy-Girl}, {Girl-Girl}, so the probability is now a 1/2 (eldest child is last on list). Why? Because the {Girl-Boy} combination isn't possible.

How might you test this?

With problems like this that seem counter-intuitive, a good way forward it to actually test the theory. Plainly, it would be expensive to ask people for real, but we can do a computer simulation. Here are the steps.

  1. Randomly create a large number of two-children families with the sample space {Boy-Boy}, {Boy-Girl}, {Girl-Boy}, {Girl-Girl} and probabilities 1/4, 1/4, 1/4, and 1/4.
  2. Select only the families that have at least one girl.
  3. Now figure out the fraction of all the selected families that are {Girl-Girl}.  
Interestingly, if you think about ways of testing a solution, it often helps you define the problem a bit better. I found just writing the test process down helped me confirm the correct answer.

Controversy, complexity, and meaning

I've presented a simple analysis here, but you should be aware that things can get a lot, lot more complex. The Wikipedia article on the Boy or girl paradox goes into some painful detail about the problem and the controversy around it. Without going into too much detail, the detailed text of the problem is important.

This might seem abstract, but I've seen variations of this problem pop up in business and I've had difficult conversations with non-technical people as a result. It's especially hard when the "common sense" error gives a more optimistic answer than the correct answer. Realistically, the only way forward is prior eduction and the use of sample space arguments.

Probability theory, and conditional probability in particular, can give some very counter-intuitive results. Here's my advice if you're working with probabilities:

  • Be as precise as you can be and list all your assumptions.
  • Figure out how you might run a computer simulation to test your theory. Go back and look at the problem definition once you've defined your simulation.
  • Don't rely on "common sense".

3 comments:

  1. It's just the Monty Hall paradox framed differently.

    ReplyDelete
    Replies
    1. Excellent point! I can see where you're coming from. The solution follows the same logic as the Monte Hall problem. In my view, these are different problems that use the same solution logic, but they are different problems.
      Perhaps we should pose another problem: what's the probability these are different problems?

      Delete
  2. It's a neat riddle, here's some R code if anybody wants to prove it to themselves
    ```
    library(tidyverse)

    samples <- 10^6
    children <- sample(c("Boy", "Girl"), size = samples , replace = TRUE)
    child_number <- rep(c(1, 2), samples/2)
    families <- rep(1:(samples/2), each = 2)

    tibble(gender = children, child_number = child_number, family = families) |>
    pivot_wider(values_from = gender, names_from = child_number, names_prefix = "child_", id_cols = family) |>
    filter(child_1 == "Girl" | child_2 == "Girl") |> # has at least one girl
    summarise(both_girls = mean(child_1 == "Girl" & child_2 == "Girl")) # proportion where both are girls
    ```

    ReplyDelete