Canadian Bianca Andreescu and Rafael Nadal won the female and male U.S. Open Saturday. For most of the weekend, I thought about what it takes to win the U.S. Open.
Players must win 7 consecutive matches in a row to win this tournament.
To win an individual match, a male player must be the first to win 3 sets while the female player must be the first to win 2 sets.
To win a set, a player must be the first to win 6 games and must win by 2. If it is 6-5, they must win the next game to win the set 7-5. If it is 6-6, then the players play a tie-breaking game (different than a regular game) to determine the winner of the set.
To win a game, players must get at least 4 points and must win by 2 points. To win a tie-breaking game, players must get at least 7 points and must win by 2 points. These could go on and on, theoretically.
Finally, to win a point in tennis may actually take too long to describe. Someone serves, and there is some hitting back and forth, and somebody either bests their opponent with a great shot or makes a mistake.
To get a feel for what it takes to win the U.S. Open from a probabilistic view, let’s simplify the game so that every single point scored in the game is completely independent of every other point scored in tennis. This is not true realistically as points have a higher likelihood when a player is serving than when the player is returning, and a higher likelihood when playing lower ranked players than higher ranked players, but we ignore such things. Let’s suppose you have somewhere between a 50% and 55% chance of winning each individual point (these number graph nicely).
Without going into a lot of mathematical detail, I’ll share some of the things you have to know to calculate the probability you would win a game, set, or match, and finally the entire tournament only knowing this value. For those that desire a more mathematically rigorous explanation including simulations and a lot of R coding, see the link at the bottom of the post.
The Game
To model a game, recognize there is a fixed number of points we must achieve (in this case 4) and a variable number of failures that can occur before we reach those 4 successes (in this case 0, 1, or 2). If we also know the probability of winning each individual point (say it is 55%), then this fits what is called a Negative Binomial distribution.
It is trickier to deal with the probability of winning once a 3-3 tie occurs (called deuce). This involves a stochastic process and a gambler’s ruin situation. The details are complicated enough that I won’t include them here.
Tennis is very meta. Using the above set up, you can calculate the probability of winning an individual game to be about 62.3% (winning a tie-breaking game has probability 65.4%). We can now use this to model the set.
The Set
To model a set, recognize there is a fixed number of games we must achieve (in this case 6) and a variable number of failures that can occur before we reach those 6 successes (in this case 0, 1, 2, 3, or 4). If we also know the probability of winning each individual game (we did that above: 62.3%), then this fits a Negative Binomial distribution again! Again, one must use some expertise in dealing with what happens at 5-5. Since this could also lead to 6-6, it involves the probability of winning a tie-breaking game.
Using this set-up, you can calculate that the probability of wining a set in tennis is about 81.5%. Did I say tennis was meta?
The Match
To model a match, recognize there is a fixed number of sets we must achieve (in this case 3 or 2 for males and females, respectively) and a variable number of failures that can occur before we reach those 3 or 2 successes (in this case 0, 1, and 2, or 0 and 1, respectively). If we also know the probability of winning each individual set (we did that above: 81.5%), then this fits a Negative Binomial distribution one last time, but with no more complicating side calculations.
For men, the probability of winning a match is about 95.3% and for women it is about 91.0%. If you refer to the featured photo of this post, this is modeled by the two highest curves on the furthest right part of the graph.
Once these probabilities are obtained, it turns out finding the probability of winning the entire tournament is the easiest calculation of them all.
The Tournament
To win the tournament, you must win 7 matches in a row. You cannot lose a single match. To calculate the probability of this, you multiply the probability of winning an individual match together 7 times (or just raise it to the 7th power).
With a 55% chance of winning each and every individual point, a male player has about 71.4% chance of winning the tournament while a female has about 51.7% chance of winning the tournament.
On the featured image of this post, the bottom two curves represent the probability of winning the entire tournament if the point probability varies from 0.5 to 0.55.
Notice that if you only have a 50% chance of winning each and every point, then you have a near zero chance of winning the tournament!
If you’re hungry for more calculations and code, please refer to my write up on U.S. Open Chances.