Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Many wp variables with different values when running different number of games #183

Closed
mrcaseb opened this issue Feb 13, 2021 · 9 comments · Fixed by #230
Closed

Many wp variables with different values when running different number of games #183

mrcaseb opened this issue Feb 13, 2021 · 9 comments · Fixed by #230

Comments

@mrcaseb
Copy link
Member

mrcaseb commented Feb 13, 2021

library(tidyverse)
library(arsenal)
library(nflfastR)

ids <- fast_scraper_schedules(2020) %>% head(10) %>% pull(game_id)
pbp <- fast_scraper(ids) %>% decode_player_ids()
#> ✓ Download finished. Adding variables...
#> ✓ added game variables
#> ✓ added nflscrapR variables
#> ✓ added ep variables
#> ✓ added air_yac_ep variables
#> ✓ added wp variables
#> ✓ added air_yac_wp variables
#> ✓ added cp and cpoe
#> ✓ added fixed drive variables
#> ✓ added series variables
#> ✓ Procedure completed.
#> ✓ Decoding of player ids completed
data_repo <- load_pbp(2020) %>% 
  filter(game_id %in% ids) %>% 
  select_at(vars(names(pbp)))

s <- summary(comparedf(data_repo, pbp))
s$diffs.byvar.table %>% filter(n>0)
#>                      var.x                   var.y    n NAs
#> 1                       wp                      wp   53   0
#> 2                   def_wp                  def_wp   53   0
#> 3                  home_wp                 home_wp   53   0
#> 4                  away_wp                 away_wp   53   0
#> 5                      wpa                     wpa  104   0
#> 6                vegas_wpa               vegas_wpa  108   1
#> 7           vegas_home_wpa          vegas_home_wpa  108   1
#> 8             home_wp_post            home_wp_post   53   0
#> 9             away_wp_post            away_wp_post   53   0
#> 10                vegas_wp                vegas_wp   57   1
#> 11           vegas_home_wp           vegas_home_wp   53   0
#> 12     total_home_rush_wpa     total_home_rush_wpa 1330   0
#> 13     total_away_rush_wpa     total_away_rush_wpa 1330   0
#> 14     total_home_pass_wpa     total_home_pass_wpa 1199   0
#> 15     total_away_pass_wpa     total_away_pass_wpa 1199   0
#> 16                 air_wpa                 air_wpa   18   0
#> 17                 yac_wpa                 yac_wpa   12   0
#> 18            comp_air_wpa            comp_air_wpa   18   0
#> 19            comp_yac_wpa            comp_yac_wpa   12   0
#> 20 total_home_comp_air_wpa total_home_comp_air_wpa 1103   0
#> 21 total_away_comp_air_wpa total_away_comp_air_wpa 1103   0
#> 22 total_home_comp_yac_wpa total_home_comp_yac_wpa  751   0
#> 23 total_away_comp_yac_wpa total_away_comp_yac_wpa  751   0
#> 24  total_home_raw_air_wpa  total_home_raw_air_wpa 1103   0
#> 25  total_away_raw_air_wpa  total_away_raw_air_wpa 1103   0
#> 26  total_home_raw_yac_wpa  total_home_raw_yac_wpa  751   0
#> 27  total_away_raw_yac_wpa  total_away_raw_yac_wpa  751   0
d <- diffs(s)
d %>% head(30)
#>    var.x var.y ..row.names..     values.x     values.y row.x row.y
#> 1     wp    wp            94 0.792873.... 0.793337....    94    94
#> 2     wp    wp           107 0.910384.... 0.910609....   107   107
#> 3     wp    wp           139 0.097398.... 0.097866....   139   139
#> 4     wp    wp           169 0.213206.... 0.213757....   169   169
#> 5     wp    wp           180 0.718435.... 0.719560....   180   180
#> 6     wp    wp           212 0.752501.... 0.752887....   212   212
#> 7     wp    wp           230 0.285728.... 0.285877....   230   230
#> 8     wp    wp           263 0.847960.... 0.848386....   263   263
#> 9     wp    wp           283 0.929228.... 0.929174....   283   283
#> 10    wp    wp           309 0.992384.... 0.992403....   309   309
#> 11    wp    wp           392 0.682618.... 0.683063....   392   392
#> 12    wp    wp           440 0.730897.... 0.732254....   440   440
#> 13    wp    wp           446 0.866797.... 0.867168....   446   446
#> 14    wp    wp           503 0.963208.... 0.963421....   503   503
#> 15    wp    wp           523 0.997528.... 0.997546....   523   523
#> 16    wp    wp           566 0.613061.... 0.613371....   566   566
#> 17    wp    wp           580 0.556122.... 0.557336....   580   580
#> 18    wp    wp           604 0.763171.... 0.763544....   604   604
#> 19    wp    wp           635 0.934121.... 0.934154....   635   635
#> 20    wp    wp           668 0.997358.... 0.997368....   668   668
#> 21    wp    wp           694 0.004272.... 0.004209....   694   694
#> 22    wp    wp           720 0.612394.... 0.612816....   720   720
#> 23    wp    wp           756 0.556122.... 0.557336....   756   756
#> 24    wp    wp           769 0.642990.... 0.643484....   769   769
#> 25    wp    wp           783 0.557104.... 0.558263....   783   783
#> 26    wp    wp           850 0.727532.... 0.728479....   850   850
#> 27    wp    wp           895 0.529801.... 0.530521....   895   895
#> 28    wp    wp           907 0.654755.... 0.655375....   907   907
#> 29    wp    wp           945 0.598712.... 0.599226....   945   945
#> 30    wp    wp           987 0.850935.... 0.851262....   987   987

Created on 2021-02-13 by the reprex package (v1.0.0)

@mrcaseb
Copy link
Member Author

mrcaseb commented Feb 13, 2021

For comparison now same thing but for the first 30 games instead of the first 10

library(tidyverse)
library(arsenal)
library(nflfastR)

ids <- fast_scraper_schedules(2020) %>% head(30) %>% pull(game_id)
pbp <- fast_scraper(ids) %>% decode_player_ids()
#> ✓ Download finished. Adding variables...
#> ✓ added game variables
#> ✓ added nflscrapR variables
#> ✓ added ep variables
#> ✓ added air_yac_ep variables
#> ✓ added wp variables
#> ✓ added air_yac_wp variables
#> ✓ added cp and cpoe
#> ✓ added fixed drive variables
#> ✓ added series variables
#> ✓ Procedure completed.
#> ✓ Decoding of player ids completed
data_repo <- load_pbp(2020) %>% 
  filter(game_id %in% ids) %>% 
  select_at(vars(names(pbp)))

s <- summary(comparedf(data_repo, pbp))
s$diffs.byvar.table %>% filter(n>0)
#>            var.x          var.y n NAs
#> 1           desc           desc 1   0
#> 2      vegas_wpa      vegas_wpa 2   1
#> 3 vegas_home_wpa vegas_home_wpa 2   1
#> 4       vegas_wp       vegas_wp 2   1
d <- diffs(s)
d %>% head(30)
#>            var.x          var.y ..row.names..     values.x     values.y row.x
#> 1           desc           desc          2563 (15:00) .... (15:00) ....  2563
#> 2      vegas_wpa      vegas_wpa          4955 0.362151.... 0.350065....  4955
#> 3      vegas_wpa      vegas_wpa          5488 0.267549....           NA  5488
#> 4 vegas_home_wpa vegas_home_wpa          4955 -0.36215.... -0.35006....  4955
#> 5 vegas_home_wpa vegas_home_wpa          5488 -0.26754....           NA  5488
#> 6       vegas_wp       vegas_wp          4955 0.637848.... 0.649934....  4955
#> 7       vegas_wp       vegas_wp          5488 0.267549....           NA  5488
#>   row.y
#> 1  2563
#> 2  4955
#> 3  5488
#> 4  4955
#> 5  5488
#> 6  4955
#> 7  5488

Created on 2021-02-13 by the reprex package (v1.0.0)

@guga31bb
Copy link
Member

Looked for plays with different wp and LOL I think there's a pattern
image

@mrcaseb
Copy link
Member Author

mrcaseb commented Feb 13, 2021

Looked for plays with different wp and LOL I think there's a pattern
image

Haha that's obvious! So if the 12th row is "END GAME" something goes wrong,,,

@guga31bb
Copy link
Member

Have to leave computer soon but to fix the END GAME line I think we just need something like this for vegas_wp as well

@guga31bb
Copy link
Member

The win prob issue on PAT issue is different and I have no idea what's causing it but it also is a very small difference

@mrcaseb
Copy link
Member Author

mrcaseb commented Feb 13, 2021

Since number of different values gets smaller if the number of games increases I think the PATs somehow interfere with each other

@guga31bb
Copy link
Member

I added a group_by to #185 where there should have been one before but there are still differences. But these are at like the 4th decimal point on PATs and I'm not sure if I care to track it down if that's all it is

@mrcaseb
Copy link
Member Author

mrcaseb commented Feb 14, 2021

Dropping some code here in case I want to investigate this at some point. For now we don't care as the wp values differ less than 0.3 percentage points

library(tidyverse)
library(arsenal)
library(nflfastR)
progressr::handlers(global = TRUE)
dat <- load_pbp(2020)
ids <- fast_scraper_schedules(2020) %>% slice(1:10) %>% pull(game_id)
pbp <- fast_scraper(ids) %>% decode_player_ids() %>% filter(desc != "END GAME")
data_repo <- dat %>% 
  filter(game_id %in% ids) %>% 
  select_at(vars(names(pbp))) %>% 
  filter(desc != "END GAME")

s <- summary(comparedf(data_repo, pbp))
# s$diffs.byvar.table %>% filter(n>0)
d <- diffs(s)
# d %>% head(30)
big <- d %>% 
  mutate(
    data_repo = furrr::future_map_dbl(values.x, ~ ifelse(is.numeric(.x), .x, 0)),
    small_sample = furrr::future_map_dbl(values.y, ~ ifelse(is.numeric(.x), .x, 0)),
    diff_abs = abs(data_repo - small_sample),
    diff = diff_abs %>% scales::percent(accuracy = 0.01)
  ) %>% 
  # filter(diff_abs > 0.1/100) %>%
  select(var = var.x, row = ..row.names.., data_repo:diff) %>% 
  filter(var != "desc") %>% 
  arrange(desc(diff_abs))

big 

pbp %>% slice(big$row) %>% select(game_id, play_id, desc)
# pbp %>% slice(499) %>% select(game_id, play_id, desc, penalty) %>% view()

unique(pbp$game_id[big$row])

@guga31bb guga31bb linked a pull request Feb 19, 2021 that will close this issue
@mrcaseb mrcaseb mentioned this issue Feb 20, 2021
@guga31bb guga31bb reopened this Feb 20, 2021
@guga31bb guga31bb removed a link to a pull request Feb 28, 2021
@mrcaseb
Copy link
Member Author

mrcaseb commented Mar 23, 2021

To do

  • make all wpa variables NA for the END GAME line
  • make all wp variable 0, 0.5 or 1 for the END GAME line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants