Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching requests #68

Closed
mps9506 opened this issue Feb 3, 2022 · 4 comments
Closed

Caching requests #68

mps9506 opened this issue Feb 3, 2022 · 4 comments

Comments

@mps9506
Copy link

mps9506 commented Feb 3, 2022

Hi,

Thanks for this great package! I have a brief feature request.

Occasionally I get lazy and leave an API request in my script or rmarkdown document instead of running a data download script/function separately and storing the data locally. This can result in the same request being made to the server every time the function is re-run. If I just need one data snapshot, that request is totally redundant.

Instead of repeatedly making the same exact API request, get_power() could have an argument to locally cache the downloaded data and access it whenever the same exact request is made. I believe rnoaa and a few other packages do something similar.

If this seems reasonable and something you are open to incorporating I can contribute a pull request at some point.

Thanks!

@adamhsparks
Copy link
Member

adamhsparks commented Feb 3, 2022

Hi @mps9506!

I get you and yet I'm a bit conflicted on this. One of my other packages, getCRUCLdata, offers caching for this very reason and was recently unceremoniously removed from CRAN for "policy violations" after several years of being on CRAN with no issues. Apparently, there were changes in policy that disallowed the caching mechanism that rappidrs, which I am using, employs to cache the data. CRAN policy seems to be a moving target as Hadley noted here, which makes this tricky to employ. I didn't protest, that package is low use with no citations and it wasn't worth the hassle to figure things out since I was using a different package for this mechanism.

OTOH, I completely understand what you're saying. I do the same thing as well and having a cache would make it better and having it built into nasapower reduces friction for end-users.

Is using something like targets an option or is it too heavy?

If targets were an option, I'm happy to create documentation on how to use it to cache data in situations like this for nasapower.

What sort of mechanism were you envisioning for a PR?

@mps9506
Copy link
Author

mps9506 commented Feb 4, 2022

Is using something like targets an option or is it too heavy?

targets is definitely a valid workflow. Probably overkill for the use case I am thinking (or what I was doing which triggered this issue), small projects or scripts that might get shared with newish users.

What sort of mechanism were you envisioning for a PR?

I would have proposed using hoardr as I found it really easy to implement in one of my own API packages. hoardr primarily provides a convenient R6 class to expose and use file paths provided by rapddirs.

My naive understanding is that the user cache directory is safe to use. Alternatively, I think writing to a temp file is consistent with CRAN, it just would cache a temp file with every new session (I think).

I understand the hesitation to trigger rejection/removal by CRAN. Caching files would be nice, but ultimately users (me) can choose to be more diligent about doing it for themselves.

@adamhsparks
Copy link
Member

Yeah, I'm familiar with hoardr, I've used it and went directly with rappdirs to pare down the interface as it added more than what really was necessary for such a simple data set as the CRU CL 2.0.

The whole cache dir seems to be part of that moving target and I've enough issues with an API client and the whims of CRAN so I'm reluctant to add to it.

However, is memoise an option for a user to employ? https://memoise.r-lib.org

@mps9506
Copy link
Author

mps9506 commented Feb 4, 2022

I can't believe I've never run into memoise before! Yeah, that is easy for users to implement:

> mem_get_power <- memoise(get_power)
> system.time(
+     mem_get_power(community = "ag",
+                   pars = c("T2M_MAX"),
+                   temporal_api = "daily",
+                   lonlat = c(-95.358421, 29.749907),
+                   date = c("2001-01-01", "2021-12-31"))
+ )
   user  system elapsed                                                                                      
   0.33    0.03    3.14 
> system.time(
+     mem_get_power(community = "ag",
+                   pars = c("T2M_MAX"),
+                   temporal_api = "daily",
+                   lonlat = c(-95.358421, 29.749907),
+                   date = c("2001-01-01", "2021-12-31"))
+ )
   user  system elapsed 
   0.01    0.00    0.02 
> df <- mem_get_power(community = "ag",
+                     pars = c("T2M_MAX"),
+                     temporal_api = "daily",
+                     lonlat = c(-95.358421, 29.749907),
+                     date = c("2001-01-01", "2021-12-31"))

> df
NASA/POWER CERES/MERRA2 Native Resolution Daily Data  
 Dates (month/day/year): 01/01/2001 through 12/31/2021  
 Location: Latitude  29.7499   Longitude -95.3584  
 Elevation from MERRA-2: Average for 0.5 x 0.625 degree lat/lon region = 24.97 meters 
 Value for missing model data cannot be computed or out of model availability range: NA  
 Parameter(s):  
 
 Parameters: 
 T2M_MAX     MERRA-2 Temperature at 2 Meters Maximum (C)  
 
# A tibble: 7,670 x 8
     LON   LAT  YEAR    MM    DD   DOY YYYYMMDD   T2M_MAX
   <dbl> <dbl> <dbl> <int> <int> <int> <date>       <dbl>
 1 -95.4  29.7  2001     1     1     1 2001-01-01    7.31
 2 -95.4  29.7  2001     1     2     2 2001-01-02    4.65
 3 -95.4  29.7  2001     1     3     3 2001-01-03    4.95
 4 -95.4  29.7  2001     1     4     4 2001-01-04   15.3 
 5 -95.4  29.7  2001     1     5     5 2001-01-05   18.5 
 6 -95.4  29.7  2001     1     6     6 2001-01-06   20.3 
 7 -95.4  29.7  2001     1     7     7 2001-01-07   20.2 
 8 -95.4  29.7  2001     1     8     8 2001-01-08   13.3 
 9 -95.4  29.7  2001     1     9     9 2001-01-09   15.1 
10 -95.4  29.7  2001     1    10    10 2001-01-10   14.6 
# ... with 7,660 more rows

I don't see any reason to add my feature request in that case! I'll go ahead and close. Thanks for talking that through.

@mps9506 mps9506 closed this as completed Feb 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants