Alice and Bob compete. Bob wins convincingly both times. But overall, Alice is better. How come?
This happens for real in medical trials and cases where you don't have much control over the size of groups you test on:
|Trial 1 Score||80||10|
|––Percentage||=> 80%||=> 100%|
|Trial 2 Score||2||40|
|––Percentage||=> 20%||=> 40%|
|––Percentage||=> 75%||=> 45%|
The points to notice:
• In each trial, they got different sized groups assigned to them. There can be good reasons for this. One procedure may be known (or thought) to be better for specific circumstances so it would unethical to assign procedure based on “we want to simplify our statistical analysis” rather than on best possible outcome. Or you may simply be combining statistics about things not in your control.
• Alice's score for her largest group is better than Bob's score for his largest group, but those ‘largest groups’ were in different trials so they only ever get compared in the overall figure.
More on wikipedia: https://en.wikipedia.org/wiki/Simpson%27s_paradox.