This May Be The Most Counterintuitive Probability Paradox I've Ever Seen | Can you spot the error?

preview_player
Показать описание

What is the chance that a parent has two daughters, given the fact that they have two children, at least one of which is a girl named Julie?

See pinned comment for additional information.

►References (note, even some of the sources don't fully agree)

►Support the Channel

►My Setup:

Рекомендации по теме
Комментарии
Автор

"I have 2 children, one of which is a girl"
"So is the other one a boy?"
"No, the other one's a girl too."
"Then why didn't you just say that?"
"Alright, I have 2 girls, one of which is a girl."

Oscaragious
Автор

The change of probability from 1/3 to 50% is, in fact, based off the fact that the focus of the analysis has moved, with the introduction of new information (the name) which pertains to a population (daughters) which differs from the starting one (families).


To exemplify this, I'd draw your attention to 2:15. This question can be easily rephrased to give 50% as a correct answer. Let's look at the information you provide in terms of children, rather than families. With 1, 000 families with 2 children each, we have a total of 2, 000 children.


These can be represented as follows:
[notation: (X|Y) means persons of gender X given their sibling is of gender Y]


500(B|B) [or 250(B, B) pairs];
500(B|G); 500(G|B) [or 500(B/G, G/B) pairs];
500(G|G) [or 250(G, G) pairs].


If I now asked "what is the probability of this girl's sibling being a sister?" the answer would be 50%. In actual fact, we'd have 500 girls with sisters, over a population of 500+500 = 1, 000 girls. Both answers are correct: 1/3 of families with a daughter have two, and 50% of girls have a sister.


Looking at the problem at hand, the issue is that the information "1% of people named 'Julie'" (note: the issue is not the name, but its assumed distribution) is disproportionately applied to the girls population - ie, 1% of (B, G/G, B) and 2% of (G, G) are taken. What you are therefore answering is "what is the probability of this girl named 'Julie' having a sister?". The answer remains unsurprisingly 50% as above.


It is therefore incorrect, in my opinion, to disproportionately apply this to the siblings pairs, in that you are moving your focus from the families to the girls. To see this, we can change the problem slightly: if the assumption drawn from the name was that "1% of *all families with a daughter* name their daughter 'Julie'", you'd end up with your probability of 1/3 - as before.


We take our 10, 000 families. 5, 000 are (B, G/G, B) pairs; 2, 500 are (G, G) pairs. 1% of all families name their children 'Julie', so 50(B, G/G, B) pairs would have a person named Julie in them and 25(G, G). 25/75 = 1/3 as before.

davidebcc
Автор

Rando: "I have two kids. One's a girl, born on a..."
Me: "13/27ths!"
Rando: "... boat."

Gigawolf
Автор

If someone tells you they "have two children, one of whom is a girl named Julie", you can't really make the assumption that they're not weird enough to name both their daughters Julie.

TheSwiftFalcon
Автор

here's a way to rephrase the questions that makes intuitive sense:
the first question is "given that I have 2 children, at least one of which is a girl, what is the probability that they are both girls?"
the second question is "I have a daughter. What is the probability that her sibling is a girl?"
the first question is conditional probability
in the second question, despite the fact that you can mince words to make the questions virtually identical, you are only asking about the probability of one specific person's gender

MalcolmCooks
Автор

Professional statistician here: I can explain this. The issue is that the math in this video is subtly wrong. Here is why:

You have to be really careful about probability, and particularly about conditional probability when you suddenly introduce new information: What does this do to your probability distribution and what does it do to your sample space?

There are actually two situations here, and they are actually different.

1) You ask a guy: "Do you have a daughter?" He says "yes". You then ask "What is her name?" and he says "Julie".

2) You ask a guy: "Do you have a daughter?" He says "yes." You then ask "Do you have a daughter named Julie?" And he says "ZOMG you are a wizard! Yes I do!"

In the first case, the probability of him having a second daughter is still 33% as before, and in the second it is 50%. This can be thought of as follows: In the first case, if the guy has one daughter named Julie he will name her, but if he has two daughters he might name Julie, but he might also NOT name Julie and may name the other one instead. This means that if he has a daughter named Julie, he might not always tell you this. In fact, he will only tell you so 50% of the time, which reduces exactly to the case when you don't know her name: Most of the time he will only have one daughter.

In the second case, the probability of him having a second daughter is 50%, and this is also somewhat intuitive. You just did a pretty shocking bit of guesswork with her name, but your guess would have had a better chance of success if he had two daughters than only one, right?

The same thing works for Tuesday, pretty much exactly the same way as in the name situation. For physical presence it is slightly different, but the reason is still somewhat similar. The "there she is", might be enough to get the 50% answer form the video. We have to understand the distribution of people who are there. For instance, imagine you are sitting there and there are two kids, a boy and a girl, and he points to the girl and says "there she is". Do you still think there is a 50% chance the other is a girl? How about if there are two girls there and he points to one of them? How about if there is only one other person in the room and it happens to be his daughter? Note that these situations are all different from each other.

EDIT: OK I just saw that this was addressed in another video by the same guy. I still want to leave the answer up though, just in case someone does not find the other video.

tuerda
Автор

Man: I have two children, at least one of which is a girl.
Me: Yeah, these are confusing times when it comes to gender.

mscottveach
Автор

Coworker: It's my daughter's birthday.

Me: HOLD ON I GOT THIS

Jonathanbass
Автор

I feel this is approached from a strange angle - the idea that the probability is based on if the older sibling is the girl mentioned or not. The question is simply "What is the probability of the other child being a boy or girl?" then it truly is 50/50, since there are only two, equally probably outcomes. The paradox in the video only comes into question because they're assuming that
A) In the pool of numbers the probability stems from that there is an equal number of boy/boy, boy/girl and girl/girl options but that there are also additional options of which one is the older/younger sibling.
B) That the older sibling/younger sibling pairing counts as a separate pool to draw probability numbers from, even though that factor is entirely irrelevant to the initial question.

If the question was "What is the probability of the other child being a girl and the younger/older sibling" then the probability would shift.

Nero
Автор

In your first point at 1:20 when claiming that there is a 33% chance of the other sibling being a girl, there is a really big problem with how you are presenting the case. When comparing the known girl to the the potential boy, you state the possibilities as *girl born first - boy born second* or *boy born first - girl born second.* But when comparing the known girl to the potential girl, you just say "and two girls." But in the case of a comparison of the known girl with the potential girl, one of them still must be born first and the other must be born second, and there are two equally likely orders in which this can happen, giving this scenario two outcomes to compare against the two outcomes of the boy-girl/girl-boy pairs.


The problem arises because you are selectively identifying the known girl. When comparing her to the potential boy, you give her the identity of "older/first" or "younger/second, " aka order-of-birth. However, you fail to preserve that order-of-birth identity when comparing the known girl to the potential girl. If you were consistent in the way you analyze the outcomes, you would see that the probability is *always* a 50% chance of the other child being a girl because the known girl would still have to be "older/first" or "younger/second". You can see this in your second argument when you use a name as an identifier rather than order-of-birth. By not remaining consistent in the first argument, you're ruining the integrity of the comparison.

joshrolfs
Автор

Great it's 11pm here and I'm watching this. Not gonna sleep tonight I guess...

jlhjlh
Автор



Okay if you haven't watched the video yet (and have never seen this paradox), then don't read the following paragraphs just yet. Look at the math I show and see what conclusions you can come to. After A LOT of thought I see some issues but they are very subtle and not obvious....well they weren't to me, but I want to try and clear everything up below.

TLDR: The fact some GG parents who have a daughter named Julie won't tell you about Julie but rather the other daughter matters here. So if we only know that the people in some room have 2 kids, then when one tells us they have a daughter, the probability is 50%, when they mention the name, it stays 50%. If instead we know everyone has 2 kids and at least one daughter (this is known before talking to anyone) then the probabilities all are 33.3%.


Longer Version: So first I think all the math in this video holds and is perfectly accurate IF we treat this more like a typical probability problem and phrase it like this. "If we have a room full of people who have two children then ask those with at least one daughter to step forward, what is the probability a randomly selected one has 2 daughters?" This answer is 33.3% as stated in the video. If you then ask all of those people who have a daughter named Julie to step forward then 50% of those families will have 2 daughters, also as stated in the video (because you have twice as many potential Julie's in each (girl, girl) family). And lastly if instead of the name we ask those with a daughter who was born on a Tuesday to step forward, then 13/27 of those will have two daughters. All of that is correct and although it may seem kind of weird, I wouldn't call those paradoxes.


However, in any stats class when we are GIVEN something, we rarely think about HOW we are given that information and in this problem we need to. Imagine a room of 7500 fathers who have 2 children, one of which is a girl (that means 5000 will have a son and daughter, while 2500 of them have 2 daughters). Then let's say you talk to every single father in that room and at some point in conversation they all end up randomly mentioning the name of one of their daughters (or their only daughter if it's a GB/BG family). If 1/100 girls are named Julie, then 500 fathers with a son and daughter will have told you Julie (which is every BG/GB father in that room who has a daughter named Julie as there's no other daughter for them to name). Then of the GG families there WILL be 500 fathers with a daughter named Julie as I stated in the video, assuming no overlap. However, only half of them will tell you of Julie, the other half will tell you of their other daughter. That means you've heard the name Julie from 250 GG fathers that night and 500 GB/BG families. If every single time you heard the name Julie you bet $1 they have 2 daughters, you'd win $250 but lose $500 and thus end at a loss. You'd only win 33.3% of the time which keeps the same probability as BEFORE we heard the name. This here kind of resolves the paradox I mentioned in the video with the whole "i have a daughter....whose name is Julie" part. The thing to note is that some of those GG fathers with a daughter named Julie will not tell you about her but rather the other daughter. So you're not HEARING about Julie from the GG families enough for it to be 50% and this reflects reality more of talking to someone in a bar who states the name of their daughter.


And actually you could argue there's a flaw in the previous paragraph because of HOW we obtain the fact that the father has at least one daughter. Imagine we are in a room full of 1000 fathers that have 2 kids but we DON'T know they all have at least one daughter (250BB, 500GB/BG, and 250GG). We then talk to all of them and throughout the conversation they all mention the gender of one of their children at random. That leaves 250 fathers with a boy and girl (half of the total) that told you they have a daughter and 250 fathers with two daughters (all of them) who said the same thing (since that was the only option for them). Because of this you have a 50% chance of guessing correctly that they have 2 daughters. If those fathers then tell you the name of one of their daughters and you analyze all who say Julie (or any name for that matter) you retain that 50% probability.


To summarize, it's all about whether you know something about all the parents in that room vs what they tell you at random during conversation. When you to a bar full of strangers, since you don't know anything about the people there in terms of how many kids they have, you could say the 50% stays consistent in this video. But if we know the parents in that room have 2 kids, one is a girl, you get the 33.3% in every situation. And if you know everyone in that room also has a daughter named Julie, then you go to the 50%.

zachstar
Автор

Okay, I checked and this was published 7 April. But maybe it was *filmed* on the first?

trelligan
Автор

I think the intuitiveness comes more readily when you just think of it as "the math changes when you are liking at it from the perspective of the identified daughter", whether it be that she's identified by her name, day she was born, etc.

NeverCloud
Автор

So what I'm getting from this is that if I want to increase the likelihood of having two daughters I just have to give the first one a name? Sick

nickhawdon
Автор

The ambiguity here is easier to spot if you reformulate this as questions and answers.

Q: Do you have at least one daughter?
A: Yes, and incidentally the name of the daughter I am thinking of is Julie

Then the probability is 1/3

Q: Do you have a daughter named Julie?
A: Yes

Then the probability is 1/2

BradleyWhistance
Автор

this reminds me of how in quantum physics, once you make a meassurement it collapses the wave function, just like labeling a girl a certain way changes the probability

lukastefanovic
Автор

Doing maths on language will only reveal the vagaries of language :D

Stampianirrationalism
Автор

The problem with this is cherry picking the distinction between the girls. He's deciding that when you dont have an identifier (g1, g2) is the same as (g2, g1) which in statistics is already incredibly wrong. Even without given identifiers you must always count both versions of two girls as separate, distinct, and plausible outcomes.

aWildKITsune
Автор

Man: I have 2 children one of wich is a girl born on a Tuesday, her name is julie, she's standing next to me, is left handed and wears glasses.

What is the probability of that family having a dog?

kenneth