Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPA wrong on safety scored by posteam #186

Closed
ConnerEvans opened this issue Feb 14, 2021 · 5 comments · Fixed by #201
Closed

EPA wrong on safety scored by posteam #186

ConnerEvans opened this issue Feb 14, 2021 · 5 comments · Fixed by #201
Labels
bug Something isn't working

Comments

@ConnerEvans
Copy link

I found that when the team that is in possession gets a safety on the other team, the code assumes that the other team got the safety. For example, JAX punted to TEN who muffed the catch and then JAX got a safety. The code assumes that TEN got the safety because JAX had possession at the beginning of the play.
The epa is (-2) - (-0.366) = (-1.634) instead of (2) - (-0.366) = (2.366)

desc: (:59) (Punt formation) 9-L.Cooke punts 47 yards to TEN 7, Center-45-M.Overton. 17-C.Batson MUFFS catch, touched at TEN 7, and recovers at TEN 1. 17-C.Batson tackled in End Zone for -1 yards, SAFETY (48-L.Jacobs).

library(tidyverse)
pbp <- readRDS(url(glue::glue("https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_2018.rds")))
pbp %>% filter(game_id == '2018_14_JAX_TEN', between(play_id, 740, 840)) %>% select(posteam, down, ydstogo, desc, ep,
Screen Shot 2021-02-14 at 9 07 25 AM
epa) %>% View

@ConnerEvans
Copy link
Author

library(tidyverse)
pbp <- readRDS(url(glue::glue("https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_2018.rds")))
pbp %>%
filter(game_id == '2018_14_JAX_TEN', between(play_id, 740, 840)) %>% select(posteam, down, ydstogo, desc, ep, epa) %>%
View

Screen Shot 2021-02-14 at 9 07 25 AM

@ConnerEvans
Copy link
Author

This is the code to view all the safeties that the current code assumes to be against the posteam but are actually against the other team. If the safety was against the posteam, 2 points should be added to the other team, but the actual score gave 2 points to the posteam.

'''r
library(tidyverse)
pbp <- readRDS(url(glue::glue("https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_2017.rds")))
pbp %>% filter(safety==TRUE, (score_differential_post - score_differential) > 0 ) %>% select(posteam, desc, ep, epa, score_differential, score_differential_post) %>% View
'''

Screen Shot 2021-02-25 at 11 25 14 PM

@ConnerEvans
Copy link
Author

library(tidyverse)
pbp <- readRDS(url(glue::glue("https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_2017.rds")))
pbp %>% filter(safety==TRUE, (score_differential_post - score_differential) > 0 ) %>% select(posteam, desc, ep, epa, score_differential, score_differential_post) %>% View

@guga31bb
Copy link
Member

Hmm when running the code above I get a completely different set of plays

pbp <- readRDS(url(glue::glue("https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_2017.rds")))
pbp %>% filter(safety==TRUE, (score_differential_post - score_differential) > 0 ) %>% 
    select(posteam, desc, ep, epa, score_differential, score_differential_post)

# A tibble: 2 x 6
  posteam desc                                          ep   epa score_differenti~ score_differentia~
  <chr>   <chr>                                      <dbl> <dbl>             <dbl>              <dbl>
1 NE      (9:19) 6-R.Allen punts 54 yards to LAC 1~ -0.760 -1.24                 0                  2
2 MIA     (9:03) (Punt formation) 16-M.Haack punts~ -0.290 -1.71                24                 26

Checking the first play seems to be correct:

> pbp %>%
+   filter(game_id == "2017_08_LAC_NE") %>%
+     dplyr::slice(50:55) %>%
+   select(qtr, posteam, home_team, desc, total_home_score, total_away_score)
# A tibble: 6 x 6
    qtr posteam home_team desc                                      total_home_score total_away_score
  <dbl> <chr>   <chr>     <chr>                                                <dbl>            <dbl>
1     2 NE      NE        (10:24) (No Huddle) 34-R.Burkhead left t~                7                7
2     2 NE      NE        (9:51) 12-T.Brady sacked at NE 35 for -9~                7                7
3     2 NA      NE        Timeout #1 by LAC at 09:28.                              7                7
4     2 NE      NE        (9:28) (Shotgun) 12-T.Brady pass incompl~                7                7
5     2 NE      NE        (9:19) 6-R.Allen punts 54 yards to LAC 1~                9                7
6     2 NE      NE        8-D.Kaser kicks 53 yards from LAC 20 to ~                9                7

So I think EPA is wrong but the score is not?

@ConnerEvans
Copy link
Author

ConnerEvans commented Feb 26, 2021

Very sorry, I just realized that this is actually what I had ran:

library(tidyverse)
seasons <- 2011:2020
pbp <- purrr::map_df(seasons, function(x) {
  readRDS(
    url(
      glue::glue("https://raw.githubusercontent.com/guga31bb/nflfastR-data/master/data/play_by_play_{x}.rds")
    )
  )
})
pbp %>% filter(safety==TRUE, (score_differential_post - score_differential) > 0 ) %>% select(posteam, desc, ep, epa, score_differential, score_differential_post) %>% View

Screen Shot 2021-02-26 at 7 01 19 AM

so it looks at more seasons, not just 2017.

Yes the score is not wrong, but the epa is. I think this might also mean that the labeled score that the xgboost is targeting for epa is wrong.

So for example, the description of the safety from the LAC NE game above is:

(9:19) 6-R.Allen punts 54 yards to LAC 11, Center-49-J.Cardona. 12-T.Benjamin MUFFS catch, and recovers at LAC 8. 12-T.Benjamin tackled in End Zone for -8 yards, SAFETY (36-B.King). Penalty on LAC-51-K.Emanuel, Offensive Holding, declined.

I think it might be labeled as a safety scored by LAC since Safeties are usually scored by the team that does not have possession. But because it is a punt, the possession team (NE) kicked it to the receiving team (LAC) who, after gaining possession of the ball, was tackled in their endzone for the safety meaning that the safety was scored by NE.

Since the epa acts as if LAC got the safety instead of NE, I assume that it is also labeled wrong for the epa xgboost model which might be a significant issue since there have only been 186 safeties since 2011 and 6 of them have this issue.

@guga31bb guga31bb changed the title Wrong team scoring on special teams safeties EPA wrong on safety scored by posteam Feb 26, 2021
@guga31bb guga31bb added the bug Something isn't working label Feb 26, 2021
@guga31bb guga31bb linked a pull request Feb 27, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants