These are the files for the book Learn to Code with Baseball.
If you're not familiar with Git or GitHub, no problem. Just click the Source code
link under the latest release to download the files. This will download
a file called ltcwbb-files-vX.X.X.zip
, where X.X.X is the latest version.
When you unzip these (note in the book I've dropped the version number and
renamed the directory just ltcwbb-files
, which you can do too) you'll see
four sub-directories: code
, data
, anki
, solutions-to-excercises
.
You don't have to do anything with these right now except know where you put them. For example, on my mac, I have them in my home directory:
/Users/nathanbraun/ltcwbb-files
If I were using Windows, it might look like this:
C:\Users\nathanbraun\ltcwbb-files
Set these aside for now and we'll pick them up in chapter 2.
Update code in seaborn chapter to avoid misc warnings.
Fix Pandas mean
example. More: Pandas changed their defaults to throw an
error if you try to call this on string columns. Fixed example to explicitely
only call it on numeric data.
Fixed answers for exercise 2.1 (thanks for the pull request Nic!).
Minor typos.
Minor typos (thanks Paul!).
Minor typos (thanks John!).
Fixed some typos (thanks Matthew!).
Fixed some formatting and footnote typos (thanks Paul!).
Rewrote and expanded API chapter since old API stopped working (see issue #3 - thanks @lozdog245 for brining to my attention). Also added section on using the excellent MLB-StatsAPI package.
Minor rewording.
Updated Anki cards, tweaked some Python language and fixed a typo (thanks Aaron!)
Added a note explaining granularity in the main text, before asking any end of chapter exercises on it (see issue #2)
Pretty big update to scraping section to make things clearer.
Also added note on 403 response issue people were running into while scraping Baseball Almanac. Thanks to everyone who emailed me about this.
Updated file path in random forest section (see issue #1)
Add end of chapter exercises for scraping and visualization (more coming soon). Thanks Chris!
Expanded visualization section to include scatter and line plots.
Fixed some scraping in book to match what was in the code (thanks Greg, Nick!)
Fixed typo in exercise 3.3.2 and made changed LAA -> ANA in teams.csv (thanks Lennart and Tim!)
Fixed a few typos + stray football references (thanks Brooks, Mark!)
Updated visualization section + associated homework problems to use Seaborn
0.11.x (September 2020), which added a new displot
function. This means
making our distribution plots change from, say:
g = (sns.FacetGrid(df)
.map(sns.kdeplot, 'mpg', shade=True))
To:
g = sns.displot(df, x='mph', kind='kde', fill=True)
It also opens up some new possibilities (e.g. with plotting empirical CDFs) that I might discuss in a future update.
Add this changelog, bundle files in an github release vs including with SendOwl.