Skip to content

Using scouting reports to predict if players will make the MLB.

Notifications You must be signed in to change notification settings

jacobdanovitch/Trouble-With-The-Curve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trouble with the Curve

This repository contains the data, models, and web app for my paper Trouble with the Curve: Predicting Future MLB Players Using Scouting Reports.

img

To the best of my knowledge, this is the only existing dataset of its kind for baseball prospect profiles. Almost 10,000 profiles were acquired from MLB.com and FanGraphs containing players' scouting reports and 20-80 scale grades, as well as select metadata.

With the above data, an obvious question arises: Can we predict if a player will make the major leagues? We use a variety of deep learning methods to attempt to answer this question, and achieve a strong "maybe". We also present an analysis of the language variations within the reports between successful players, as well as between positions.

Model Accuracy F1
Bag-Of-Embeddings 64.65% 53.78%
TextCNN 69.02% 56.42%
LSTM+SelfAttn 68.64% 54.65%
BCN 73.52% 43.33%
HAN 66.00% 54.07%

A Hierarchical Attention Network is trained as part of the above question, allowing not only a demonstration of the research problem, but also an interpretable visualization for each prediction using attention weights.

About

Using scouting reports to predict if players will make the MLB.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published