-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EPA wrong on safety scored by posteam #186
Comments
library(tidyverse) |
This is the code to view all the safeties that the current code assumes to be against the posteam but are actually against the other team. If the safety was against the posteam, 2 points should be added to the other team, but the actual score gave 2 points to the posteam. '''r |
library(tidyverse)
pbp <- readRDS(url(glue::glue("https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_2017.rds")))
pbp %>% filter(safety==TRUE, (score_differential_post - score_differential) > 0 ) %>% select(posteam, desc, ep, epa, score_differential, score_differential_post) %>% View |
Hmm when running the code above I get a completely different set of plays
Checking the first play seems to be correct:
So I think EPA is wrong but the score is not? |
Very sorry, I just realized that this is actually what I had ran: library(tidyverse)
seasons <- 2011:2020
pbp <- purrr::map_df(seasons, function(x) {
readRDS(
url(
glue::glue("https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_{x}.rds")
)
)
})
pbp %>% filter(safety==TRUE, (score_differential_post - score_differential) > 0 ) %>% select(posteam, desc, ep, epa, score_differential, score_differential_post) %>% View so it looks at more seasons, not just 2017. Yes the score is not wrong, but the epa is. I think this might also mean that the labeled score that the xgboost is targeting for epa is wrong. So for example, the description of the safety from the LAC NE game above is: (9:19) 6-R.Allen punts 54 yards to LAC 11, Center-49-J.Cardona. 12-T.Benjamin MUFFS catch, and recovers at LAC 8. 12-T.Benjamin tackled in End Zone for -8 yards, SAFETY (36-B.King). Penalty on LAC-51-K.Emanuel, Offensive Holding, declined. I think it might be labeled as a safety scored by LAC since Safeties are usually scored by the team that does not have possession. But because it is a punt, the possession team (NE) kicked it to the receiving team (LAC) who, after gaining possession of the ball, was tackled in their endzone for the safety meaning that the safety was scored by NE. Since the epa acts as if LAC got the safety instead of NE, I assume that it is also labeled wrong for the epa xgboost model which might be a significant issue since there have only been 186 safeties since 2011 and 6 of them have this issue. |
I found that when the team that is in possession gets a safety on the other team, the code assumes that the other team got the safety. For example, JAX punted to TEN who muffed the catch and then JAX got a safety. The code assumes that TEN got the safety because JAX had possession at the beginning of the play.
The epa is (-2) - (-0.366) = (-1.634) instead of (2) - (-0.366) = (2.366)
desc: (:59) (Punt formation) 9-L.Cooke punts 47 yards to TEN 7, Center-45-M.Overton. 17-C.Batson MUFFS catch, touched at TEN 7, and recovers at TEN 1. 17-C.Batson tackled in End Zone for -1 yards, SAFETY (48-L.Jacobs).
library(tidyverse)
pbp <- readRDS(url(glue::glue("https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_2018.rds")))
pbp %>% filter(game_id == '2018_14_JAX_TEN', between(play_id, 740, 840)) %>% select(posteam, down, ydstogo, desc, ep,
epa) %>% View
The text was updated successfully, but these errors were encountered: