Series7
This commit is contained in:
parent
1479eb5ac7
commit
d58780286b
308
ipynb/Series7.ipynb
Normal file
308
ipynb/Series7.ipynb
Normal file
@ -0,0 +1,308 @@
|
||||
{
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"<div style=\"text-align: right\"><i>Peter Norvig<br>May 2025</i></div>\n",
|
||||
"\n",
|
||||
"# Seven-Game Series?\n",
|
||||
"\n",
|
||||
"This time of year the basketball playoffs are in full swing. I have a pet peeve: analysts who say *\"These are two evenly matched teams. I expect the series will go seven games.\"* Is that really true? If each game is a 50/50 tossup, how often will this result in a seven-game series? How does the home-court advantage come into play? What if one team is slightly better? This notebook examines these questions. \n",
|
||||
"\n",
|
||||
"## Vocabulary of Concepts\n",
|
||||
"\n",
|
||||
"Here are the concepts I want to model, and the way I will represent them:\n",
|
||||
"\n",
|
||||
"- **Series**: Two teams play games until one wins four games. A game cannot be a tie. So a series can take anywhere from 4 to 7 games.\n",
|
||||
"- **Seeded team**: The team that is seeded higher (due to regular season wins) gets to host 4 of the potential 7 games (games 1, 2, 5, and 7).\n",
|
||||
"- **Series outcome**: A pair of integers, for example `(4, 3)` means that the seeded team has 4 wins and the other team has 3.\n",
|
||||
"- **Outcome distribution**: The class `Dist` will represent a probability distribution over possible series outcomes. \n",
|
||||
"- **Game win probability**: We assume there is a given fixed probability, *p*, that the seeded team will win a single game on a neutral court.\n",
|
||||
"- **Home-court advantage**: However, there is a home-court advantage of probability *h*. So the seeded team has probability *p* + *h* of winning each of the games 1, 2, 5, and 7, and probability *p* - *h* of winning each of the other games.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Below is the class `Dist`. For example, `Dist({(4, 3): 0.5, (3, 4): 0.5})` represents a series that goes seven games, and each team has an equal chance to win. *Note*: `Dist` inherits from `Counter`, so you can add two dists together, and I define a `__mul__` method so you can multiply a dist by a probability."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 1,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from collections import Counter\n",
|
||||
"\n",
|
||||
"class Dist(Counter):\n",
|
||||
" \"\"\"A Probability Distribution of {outcome: probability} results.\"\"\"\n",
|
||||
" def __mul__(self, weight: float) -> 'Dist':\n",
|
||||
" \"\"\"You can multiply a Dist by a scalar weight.\"\"\"\n",
|
||||
" return Dist({outcome: p * weight for outcome, p in self.items()})"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The function `series_results` returns a distribution of possible series outcomes. The parameters are:\n",
|
||||
"- *p*, the seeded team's single-game win probability,\n",
|
||||
"- *h*, the home-court advantage probability\n",
|
||||
"- *home*, a 7-character string saying which games are home or away for the seeded team\n",
|
||||
"- *W* and *L*, the number of wins and losses for the seeded team so far in the series (default 0 of each)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 2,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"def series_results(p=0.50, h=0.10, home='HHAAHAH', W=0, L=0) -> Dist:\n",
|
||||
" \"\"\"Return {(win, loss): probability, ...} for all possible outcomes of the series, given\n",
|
||||
" the single-game win probability for the seeded team, `p`, and the home-court advantage, `h`.\"\"\"\n",
|
||||
" def results(W: int, L: int) -> Dist:\n",
|
||||
" if W == 4 or L == 4:\n",
|
||||
" return Dist({(W, L): 1.0})\n",
|
||||
" else:\n",
|
||||
" p1 = p + (h if home[W + L] == 'H' else -h) # Probability of winning this one game\n",
|
||||
" return Dist(results(W + 1, L) * p1 +\n",
|
||||
" results(W, L + 1) * (1 - p1))\n",
|
||||
" return results(W, L)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Let's look at the results for a truly even series, where each game is 50/50, and there is no home-court advantage:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"data": {
|
||||
"text/plain": [
|
||||
"Dist({(4, 2): 0.15625,\n",
|
||||
" (4, 3): 0.15625,\n",
|
||||
" (3, 4): 0.15625,\n",
|
||||
" (2, 4): 0.15625,\n",
|
||||
" (4, 1): 0.125,\n",
|
||||
" (1, 4): 0.125,\n",
|
||||
" (4, 0): 0.0625,\n",
|
||||
" (0, 4): 0.0625})"
|
||||
]
|
||||
},
|
||||
"execution_count": 3,
|
||||
"metadata": {},
|
||||
"output_type": "execute_result"
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"series_results(p=0.50, h=0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"The four most common outcomes are equally likely: either 6 or 7 games, with either team winning. It is easy to see that 6 or 7 games is equally likely: after 5 games, if the series isn't over, it must be 3-2 or 2-3. Half the time the \"3\" team will win game 6 (resulting in a 6 game series) and half the time they will lose (resulting in a 7 game series).\n",
|
||||
"\n",
|
||||
"Now I would like to make a **table** of results, for various winning percentages *p*. For each *p* I want to see \n",
|
||||
"- The probability for each possible series outcome (4-3, 4-2, etc.).\n",
|
||||
"- The probability for each possible series length (4, 5, 6, or 7 games).\n",
|
||||
"- The probability that the seeded team wins the series.\n",
|
||||
"\n",
|
||||
"*Note*: I will consider cases where the higher-seeded team has a game win probability less than 50% (as sometimes happens when a player is injured)."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 10,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"from numpy import arange\n",
|
||||
"from typing import Sequence\n",
|
||||
"\n",
|
||||
"def series_results_table(h=0.0, pcts=arange(0.48, 0.70, 0.02)):\n",
|
||||
" \"\"\"What happens in the series for various values of `p` and a giv en value of `h`?\"\"\"\n",
|
||||
" outcomes = [(4, 3), (4, 2), (4, 1), (4, 0), (0, 4), (1, 4), (2, 4), (3, 4)]\n",
|
||||
" bar = f'----+' + '-' * 5 * len(pcts)\n",
|
||||
" print(f' | Game Win Percentage (± home-court advantage = {h:.0%})')\n",
|
||||
" print(row('', pcts))\n",
|
||||
" print(bar)\n",
|
||||
" for (W, L) in outcomes:\n",
|
||||
" results = [series_results(p, h)[W, L] for p in pcts]\n",
|
||||
" print(row(f'{W}-{L}', results))\n",
|
||||
" print(bar)\n",
|
||||
" for N in [7, 6, 5, 4]:\n",
|
||||
" results = [series_results(p, h)[4, N-4] + series_results(p, h)[N-4, 4] for p in pcts]\n",
|
||||
" print(row(N, results))\n",
|
||||
" print(bar)\n",
|
||||
" print(row('Win', [sum(series_results(p, h)[W, L] for W, L in outcomes if W == 4) for p in pcts]))\n",
|
||||
"\n",
|
||||
"def row(name, pcts: Sequence[float]) -> str:\n",
|
||||
" \"\"\"Create a string representing a row in the table.\"\"\"\n",
|
||||
" return f'{name:^3} | ' + ' '.join(f'{p*100:4.1f}' for p in pcts)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Here is the table when there is no home-court advantage:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 11,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" | Game Win Percentage (± home-court advantage = 0%)\n",
|
||||
" | 48.0 50.0 52.0 54.0 56.0 58.0 60.0 62.0 64.0 66.0 68.0\n",
|
||||
"----+-------------------------------------------------------\n",
|
||||
"4-3 | 14.9 15.6 16.2 16.6 16.8 16.8 16.6 16.2 15.7 14.9 14.0\n",
|
||||
"4-2 | 14.4 15.6 16.8 18.0 19.0 20.0 20.7 21.3 21.7 21.9 21.9\n",
|
||||
"4-1 | 11.0 12.5 14.0 15.6 17.3 19.0 20.7 22.5 24.2 25.8 27.4\n",
|
||||
"4-0 | 5.3 6.2 7.3 8.5 9.8 11.3 13.0 14.8 16.8 19.0 21.4\n",
|
||||
"0-4 | 7.3 6.2 5.3 4.5 3.7 3.1 2.6 2.1 1.7 1.3 1.0\n",
|
||||
"1-4 | 14.0 12.5 11.0 9.7 8.4 7.2 6.1 5.2 4.3 3.5 2.9\n",
|
||||
"2-4 | 16.8 15.6 14.4 13.1 11.8 10.5 9.2 8.0 6.9 5.8 4.8\n",
|
||||
"3-4 | 16.2 15.6 14.9 14.1 13.2 12.1 11.1 9.9 8.8 7.7 6.6\n",
|
||||
"----+-------------------------------------------------------\n",
|
||||
" 7 | 31.1 31.2 31.1 30.7 29.9 28.9 27.6 26.2 24.5 22.6 20.6\n",
|
||||
" 6 | 31.2 31.2 31.2 31.0 30.8 30.4 30.0 29.4 28.6 27.8 26.7\n",
|
||||
" 5 | 25.1 25.0 25.1 25.3 25.7 26.2 26.9 27.6 28.5 29.3 30.2\n",
|
||||
" 4 | 12.6 12.5 12.6 13.0 13.6 14.4 15.5 16.9 18.5 20.3 22.4\n",
|
||||
"----+-------------------------------------------------------\n",
|
||||
"Win | 45.6 50.0 54.4 58.7 62.9 67.1 71.0 74.8 78.3 81.6 84.7\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"series_results_table(h=0.0)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Note that a 6-game series is most likely, except when *p* is exactly 50% (in which case 6- and 7-game series are equally likely), or when *p* is 66% or more (in which case a 5-game series is more likely).\n",
|
||||
"\n",
|
||||
"In recent years the home-court advantage has been about 5%:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 12,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" | Game Win Percentage (± home-court advantage = 5%)\n",
|
||||
" | 48.0 50.0 52.0 54.0 56.0 58.0 60.0 62.0 64.0 66.0 68.0\n",
|
||||
"----+-------------------------------------------------------\n",
|
||||
"4-3 | 16.6 17.3 17.8 18.2 18.4 18.3 18.1 17.6 17.0 16.1 15.1\n",
|
||||
"4-2 | 13.2 14.4 15.5 16.6 17.6 18.4 19.1 19.6 20.0 20.1 20.0\n",
|
||||
"4-1 | 12.2 13.7 15.4 17.1 18.9 20.7 22.5 24.4 26.2 27.9 29.6\n",
|
||||
"4-0 | 5.2 6.1 7.2 8.4 9.7 11.1 12.8 14.6 16.6 18.8 21.2\n",
|
||||
"0-4 | 7.2 6.1 5.2 4.4 3.7 3.0 2.5 2.0 1.6 1.3 1.0\n",
|
||||
"1-4 | 12.7 11.2 9.9 8.6 7.4 6.3 5.3 4.5 3.7 3.0 2.4\n",
|
||||
"2-4 | 18.2 16.9 15.5 14.1 12.7 11.3 9.9 8.6 7.4 6.3 5.2\n",
|
||||
"3-4 | 14.7 14.1 13.5 12.6 11.7 10.8 9.7 8.7 7.6 6.6 5.6\n",
|
||||
"----+-------------------------------------------------------\n",
|
||||
" 7 | 31.3 31.4 31.3 30.8 30.1 29.1 27.8 26.3 24.6 22.7 20.7\n",
|
||||
" 6 | 31.5 31.3 31.1 30.7 30.3 29.7 29.1 28.3 27.4 26.4 25.3\n",
|
||||
" 5 | 24.9 25.0 25.3 25.7 26.3 27.0 27.9 28.8 29.8 30.9 31.9\n",
|
||||
" 4 | 12.4 12.3 12.4 12.7 13.3 14.2 15.3 16.6 18.2 20.0 22.1\n",
|
||||
"----+-------------------------------------------------------\n",
|
||||
"Win | 47.2 51.6 56.0 60.3 64.5 68.6 72.5 76.2 79.7 82.9 85.8\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"series_results_table(h=0.05)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now, when the seeded team has a game win probability in the range 50% to 54%, the other team is favored to win game 6, making a 7-game series more likely. Overall, a home-court advantage of 5% gives the seeded team about a 1.6% better chance of winning the series.\n",
|
||||
"\n",
|
||||
"Some people think a home-court advantage of as much as 13% is reasonable:"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"execution_count": 7,
|
||||
"metadata": {},
|
||||
"outputs": [
|
||||
{
|
||||
"name": "stdout",
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
" | Game Win Percentage (± home court advantage = 13%)\n",
|
||||
" | 48.0 50.0 52.0 54.0 56.0 58.0 60.0 62.0 64.0 66.0\n",
|
||||
"----+--------------------------------------------------\n",
|
||||
"4-3 | 19.8 20.5 21.1 21.4 21.5 21.4 21.0 20.3 19.4 18.3\n",
|
||||
"4-2 | 11.5 12.6 13.6 14.6 15.5 16.3 16.9 17.4 17.6 17.7\n",
|
||||
"4-1 | 13.9 15.7 17.6 19.5 21.6 23.6 25.7 27.8 29.9 31.9\n",
|
||||
"4-0 | 4.6 5.4 6.4 7.5 8.8 10.2 11.8 13.5 15.4 17.5\n",
|
||||
"0-4 | 6.4 5.4 4.6 3.8 3.1 2.5 2.0 1.6 1.3 1.0\n",
|
||||
"1-4 | 10.5 9.2 8.0 6.8 5.8 4.8 4.0 3.2 2.6 2.0\n",
|
||||
"2-4 | 20.7 19.1 17.4 15.7 14.1 12.4 10.9 9.4 8.0 6.7\n",
|
||||
"3-4 | 12.7 12.1 11.4 10.5 9.7 8.7 7.7 6.8 5.8 4.9\n",
|
||||
"----+--------------------------------------------------\n",
|
||||
" 7 | 32.5 32.6 32.5 32.0 31.2 30.1 28.7 27.1 25.2 23.2\n",
|
||||
" 6 | 32.1 31.6 31.0 30.3 29.6 28.7 27.8 26.7 25.6 24.4\n",
|
||||
" 5 | 24.4 24.9 25.5 26.3 27.3 28.5 29.7 31.1 32.5 33.9\n",
|
||||
" 4 | 11.0 10.9 11.0 11.3 11.9 12.8 13.8 15.1 16.7 18.5\n",
|
||||
"----+--------------------------------------------------\n",
|
||||
"Win | 49.7 54.2 58.7 63.1 67.4 71.5 75.4 79.0 82.4 85.5\n"
|
||||
]
|
||||
}
|
||||
],
|
||||
"source": [
|
||||
"series_results_table(h=0.13)"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This means that a 50/50 team with a 13% home-court advantage wins a series about as often as a 52% team with no home-court advantage."
|
||||
]
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"kernelspec": {
|
||||
"display_name": "Python 3 (ipykernel)",
|
||||
"language": "python",
|
||||
"name": "python3"
|
||||
},
|
||||
"language_info": {
|
||||
"codemirror_mode": {
|
||||
"name": "ipython",
|
||||
"version": 3
|
||||
},
|
||||
"file_extension": ".py",
|
||||
"mimetype": "text/x-python",
|
||||
"name": "python",
|
||||
"nbconvert_exporter": "python",
|
||||
"pygments_lexer": "ipython3",
|
||||
"version": "3.13.1"
|
||||
}
|
||||
},
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 4
|
||||
}
|
||||
Loading…
x
Reference in New Issue
Block a user