Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best practices on updating and extracting pins #769

Closed
EKtheSage opened this issue Aug 23, 2023 · 4 comments
Closed

Best practices on updating and extracting pins #769

EKtheSage opened this issue Aug 23, 2023 · 4 comments

Comments

@EKtheSage
Copy link

Hello,

I currently run a daily job to send new data to a pin. Each day a new version of pin will be in the model_prediction_output on the board. My question is, how do I get the data of the past 7 days?

board_data %>%
  pin_write(result, 'model_prediction_output',
            type = 'parquet',
            description = 'model prediction',
            tags = c('model'))

I see the pin_versions output, but that doesn't look like an easy way to grab multiple versions of a pin.

board_data |> pin_versions('model_prediction_output')

  version                created             hash 
  <chr>                  <dttm>              <chr>
1 20230823T225549Z-385c7 2023-08-23 22:55:49 385c7

Do you have a recommended way to read in multiple versions of a pin to union them together? So, let's say, I can filter on the pin_versions result df on created, and use the version associated with created to read in the desired pin. But that seems like a hacky way to do it if there were multiple versions of the pin update in a day.

Or could there be a way to upsert a pin so it's doing insertion for new records and updates for existing records, and producing a new version of the pin?

@iandarbeynhiu
Copy link

iandarbeynhiu commented Aug 24, 2023

Very quick solution for the last 7 versions

map_df(head(pin_versions(YOUR_BOARD, "YOUR_PIN"),7)$version, function(x){
  pin_read(YOUR_BOARD, "YOUR_PIN", version = x)
})

Would give you the last 7 versions as a dataframe.....

For the last 7 days regardless of number of versions...

last_7_days <- filter(pin_versions(YOUR_BOARD, "YOUR_PIN"), created >= Sys.Date()-7)

map_df(last_7_days$version, function(x){
  pin_read(YOUR_BOARD, "YOUR_PIN", version = x)
})

Although would suggest not hard coding the board and pin in the function to keep it more general. But this would work.

For the most up to date version on each day limited to the last 7 days......

last_7_days <- pin_versions(YOUR_BOARD, "YOUR_PIN") %>%
  filter(created >= Sys.Date()-7) %>%
  mutate(Date = as_date(created)) %>%
  group_by(Date) %>%
  summarise(version = max(version))

map_df(last_7_days$version, function(x){
  pin_read(YOUR_BOARD, "YOUR_PIN", version = x)
})

@juliasilge
Copy link
Member

You can also take an approach similar to what is outlined in #758, something like this:

library(tidyverse)
library(pins)
b <- board_connect()
#> Connecting to Posit Connect 2023.07.0 at <https://colorado.posit.co/rsc>
pin_name <- "julia.silge/traffic-crash-model-metrics"

last_seven <- b |> 
  pin_versions(pin_name) |> 
  slice_head(n = 7)

last_seven |> 
  mutate(pin_contents = map(version, ~ pin_read(b, pin_name, version = .))) |> 
  unnest(pin_contents)
#> # A tibble: 4,808 × 9
#>    version created             active  size .index        .n .metric  .estimator
#>    <chr>   <dttm>              <lgl>  <dbl> <date>     <int> <chr>    <chr>     
#>  1 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-22  1119 accuracy binary    
#>  2 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-22  1119 kap      binary    
#>  3 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-22  1119 mn_log_… binary    
#>  4 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-22  1119 roc_auc  binary    
#>  5 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-29  1481 accuracy binary    
#>  6 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-29  1481 kap      binary    
#>  7 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-29  1481 mn_log_… binary    
#>  8 78815   2023-08-19 19:08:00 TRUE   28815 2020-11-29  1481 roc_auc  binary    
#>  9 78815   2023-08-19 19:08:00 TRUE   28815 2020-12-06  1695 accuracy binary    
#> 10 78815   2023-08-19 19:08:00 TRUE   28815 2020-12-06  1695 kap      binary    
#> # ℹ 4,798 more rows
#> # ℹ 1 more variable: .estimate <dbl>

Created on 2023-08-24 with reprex v2.0.2

In these results, the columns version through size are from the version metadata and the columns .index through .estimate are from the pin contents.

@EKtheSage
Copy link
Author

Thanks! This is super helpful! I think this issue is similar enough to #758 so I'll just close this one.

@github-actions
Copy link

github-actions bot commented Sep 9, 2023

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants