{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "
Peter Norvig
pytudes
March 2019
\n", "\n", "# Dice Baseball\n", "\n", "The [538 Riddler for March 22, 2019](https://fivethirtyeight.com/features/can-you-turn-americas-pastime-into-a-game-of-yahtzee/) asks us to simulate baseball using probabilities from a 19th century dice game called *Our National Ball Game*:\n", "\n", " 1,1: double 2,2: strike 3,3: out at 1st 4,4: fly out\n", " 1,2: single 2,3: strike 3,4: out at 1st 4,5: fly out\n", " 1,3: single 2,4: strike 3,5: out at 1st 4,6: fly out\n", " 1,4: single 2,5: strike 3,6: out at 1st 5,5: double play\n", " 1,5: base on error 2,6: foul out 5,6: triple\n", " 1,6: base on balls 6,6: home run\n", "\n", "\n", "The rules left some things unspecified; the following are my current choices (in an early version I made different choices that resulted in slightly more runs):\n", "\n", "* On a* b*-base hit, runners advance* b* bases, except that a runner on second scores on a 1-base hit.\n", "* On an \"out at first\", all runners advance one base.\n", "* A double play only applies if there is a runner on first; in that case other runners advance.\n", "* On a fly out, a runner on third scores; other runners do not advance.\n", "* On an error all runners advance one base. \n", "* On a base on balls, only forced runners advance.\n", "\n", "I also made some choices about the implementation:\n", "\n", "- Exactly one outcome happens to each batter. We call that an *event*.\n", "- I'll represent events with the following one letter codes:\n", " - `K`, `O`, `o`, `f`, `D`: strikeout, foul out, out at first, fly out, double play\n", " - `1`, `2`, `3`, `4`: single, double, triple, home run\n", " - `E`, `B`: error, base on balls\n", "- Note the \"strike\" dice roll is not an event; it is only part of an event. From the probability of a \"strike\" dice roll, I compute the probability of three strikes in a row, and call that a strikeout event. Sice there are 7 dice rolls giving \"strike\", the probability of a strike is 7/36, and the probability of a strikeout is (7/36)**3.\n", "- Note that a die roll such as `1,1` is a 1/36 event, whereas `1,2` is a 2/36 event, because it also represents (2, 1).\n", "- I'll keep track of runners with a list of occupied bases; `runners = [1, 2]` means runners on first and second.\n", "- A runner who advances to base 4 or higher has scored a run (unless there are already 3 outs).\n", "- The function `inning` simulates a half inning and returns the number of runs scored.\n", "- I want to be able to test `inning` by feeding it specific events, and I also want to generate random innings. So I'll make the interface be that I pass in an *iterable* of events. The function `event_stream` generates an endless stream of randomly sampled events.\n", "- Note that it is consider good Pythonic style to automatically convert Booleans to integers, so for a runner on second (`r = 2`) when the event is a single (`e = '1'`), the expression `r + int(e) + (r == 2)` evaluates to `2 + 1 + 1` or `4`, meaning the runner on second scores.\n", "- I'll play 1 million innings and store the resulting scores in `innings`.\n", "- To simulate a game I just sample 9 elements of `innings` and sum them.\n", "\n", "# The Code" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "import random" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def event_stream(events='2111111EEBBOOooooooofffffD334', strike=7/36):\n", " \"An iterator of random events. Defaults from `Our National Ball Game`.\"\n", " while True:\n", " yield 'K' if (random.random() < strike ** 3) else random.choice(events)\n", " \n", "def inning(events=event_stream(), verbose=False) -> int:\n", " \"Simulate a half inning based on events, and return number of runs scored.\"\n", " outs = runs = 0 # Inning starts with no outs and no runs,\n", " runners = [] # ... and with nobody on base\n", " for e in events:\n", " if verbose: print(f'{outs} outs, {runs} runs, event: {e}, runners: {runners}')\n", " # What happens to the batter?\n", " if e in 'KOofD': outs += 1 # Batter is out\n", " elif e in '1234EB': runners.append(0) # Batter becomes a runner\n", " # What happens to the runners?\n", " if e == 'D' and 1 in runners: # double play: runner on 1st out, others advance\n", " outs += 1\n", " runners = [r + 1 for r in runners if r != 1]\n", " elif e in 'oE': # out at first or error: runners advance\n", " runners = [r + 1 for r in runners]\n", " elif e == 'f' and 3 in runners and outs < 3: # fly out: runner on 3rd scores\n", " runners.remove(3)\n", " runs += 1\n", " elif e in '1234': # single, double, triple, homer\n", " runners = [r + int(e) + (r == 2) for r in runners]\n", " elif e == 'B': # base on balls: forced runners advance \n", " runners = [r + forced(runners, r) for r in runners]\n", " # See if inning is over, and if not, whether anyone scored\n", " if outs >= 3:\n", " return runs\n", " runs += sum(r >= 4 for r in runners)\n", " runners = [r for r in runners if r < 4]\n", " \n", "def forced(runners, r) -> bool: return all(b in runners for b in range(r))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Testing\n", "\n", "Let's peek at some random innings:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 outs, 0 runs, event: E, runners: []\n", "0 outs, 0 runs, event: 4, runners: [1]\n", "0 outs, 2 runs, event: E, runners: []\n", "0 outs, 2 runs, event: 1, runners: [1]\n", "0 outs, 2 runs, event: f, runners: [2, 1]\n", "1 outs, 2 runs, event: B, runners: [2, 1]\n", "1 outs, 2 runs, event: 1, runners: [3, 2, 1]\n", "1 outs, 4 runs, event: E, runners: [2, 1]\n", "1 outs, 4 runs, event: o, runners: [3, 2, 1]\n", "2 outs, 5 runs, event: o, runners: [3, 2]\n" ] }, { "data": { "text/plain": [ "5" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inning(verbose=True)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 outs, 0 runs, event: 1, runners: []\n", "0 outs, 0 runs, event: B, runners: [1]\n", "0 outs, 0 runs, event: O, runners: [2, 1]\n", "1 outs, 0 runs, event: 1, runners: [2, 1]\n", "1 outs, 1 runs, event: 3, runners: [2, 1]\n", "1 outs, 3 runs, event: 1, runners: [3]\n", "1 outs, 4 runs, event: f, runners: [1]\n", "2 outs, 4 runs, event: o, runners: [1]\n" ] }, { "data": { "text/plain": [ "4" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inning(verbose=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can feed in any events we want to test the code:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0 outs, 0 runs, event: 2, runners: []\n", "0 outs, 0 runs, event: E, runners: [2]\n", "0 outs, 0 runs, event: B, runners: [3, 1]\n", "0 outs, 0 runs, event: B, runners: [3, 2, 1]\n", "0 outs, 1 runs, event: 1, runners: [3, 2, 1]\n", "0 outs, 3 runs, event: D, runners: [2, 1]\n", "2 outs, 3 runs, event: B, runners: [3]\n", "2 outs, 3 runs, event: 1, runners: [3, 1]\n", "2 outs, 4 runs, event: 2, runners: [2, 1]\n", "2 outs, 5 runs, event: f, runners: [3, 2]\n" ] }, { "data": { "text/plain": [ "5" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "inning('2EBB1DB12f', verbose=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That looks good.\n", "\n", "# Simulating\n", "\n", "Now, simulate a million innings, and then sample from them to simulate a million nine-inning games (for one team):" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "N = 1000000\n", "innings = [inning() for _ in range(N)]\n", "games = [sum(random.sample(innings, 9)) for _ in range(N)]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see histograms:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "def hist(nums, title): \n", " \"Plot a histogram.\"\n", " plt.hist(nums, ec='black', bins=max(nums)-min(nums)+1, align='left')\n", " plt.title(f'{title} Mean: {sum(nums)/len(nums):.3f}, Min: {min(nums)}, Max: {max(nums)}')\n", " \n", "hist(innings, 'Runs per inning:')" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "hist(games, 'Runs per game:')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 2 }