Caching requests #68

mps9506 · 2022-02-03T19:38:11Z

Hi,

Thanks for this great package! I have a brief feature request.

Occasionally I get lazy and leave an API request in my script or rmarkdown document instead of running a data download script/function separately and storing the data locally. This can result in the same request being made to the server every time the function is re-run. If I just need one data snapshot, that request is totally redundant.

Instead of repeatedly making the same exact API request, get_power() could have an argument to locally cache the downloaded data and access it whenever the same exact request is made. I believe rnoaa and a few other packages do something similar.

If this seems reasonable and something you are open to incorporating I can contribute a pull request at some point.

Thanks!

The text was updated successfully, but these errors were encountered:

adamhsparks · 2022-02-03T23:44:50Z

Hi @mps9506!

I get you and yet I'm a bit conflicted on this. One of my other packages, getCRUCLdata, offers caching for this very reason and was recently ~~unceremoniously~~ removed from CRAN for "policy violations" after several years of being on CRAN with no issues. Apparently, there were changes in policy that disallowed the caching mechanism that rappidrs, which I am using, employs to cache the data. CRAN policy seems to be a moving target as Hadley noted here, which makes this tricky to employ. I didn't protest, that package is low use with no citations and it wasn't worth the hassle to figure things out since I was using a different package for this mechanism.

OTOH, I completely understand what you're saying. I do the same thing as well and having a cache would make it better and having it built into nasapower reduces friction for end-users.

Is using something like targets an option or is it too heavy?

If targets were an option, I'm happy to create documentation on how to use it to cache data in situations like this for nasapower.

What sort of mechanism were you envisioning for a PR?

mps9506 · 2022-02-04T00:55:14Z

Is using something like targets an option or is it too heavy?

targets is definitely a valid workflow. Probably overkill for the use case I am thinking (or what I was doing which triggered this issue), small projects or scripts that might get shared with newish users.

What sort of mechanism were you envisioning for a PR?

I would have proposed using hoardr as I found it really easy to implement in one of my own API packages. hoardr primarily provides a convenient R6 class to expose and use file paths provided by rapddirs.

My naive understanding is that the user cache directory is safe to use. Alternatively, I think writing to a temp file is consistent with CRAN, it just would cache a temp file with every new session (I think).

I understand the hesitation to trigger rejection/removal by CRAN. Caching files would be nice, but ultimately users (me) can choose to be more diligent about doing it for themselves.

adamhsparks · 2022-02-04T03:20:06Z

Yeah, I'm familiar with hoardr, I've used it and went directly with rappdirs to pare down the interface as it added more than what really was necessary for such a simple data set as the CRU CL 2.0.

The whole cache dir seems to be part of that moving target and I've enough issues with an API client and the whims of CRAN so I'm reluctant to add to it.

However, is memoise an option for a user to employ? https://memoise.r-lib.org

mps9506 · 2022-02-04T15:44:09Z

I can't believe I've never run into memoise before! Yeah, that is easy for users to implement:

> mem_get_power <- memoise(get_power)
> system.time(
+     mem_get_power(community = "ag",
+                   pars = c("T2M_MAX"),
+                   temporal_api = "daily",
+                   lonlat = c(-95.358421, 29.749907),
+                   date = c("2001-01-01", "2021-12-31"))
+ )
   user  system elapsed                                                                                      
   0.33    0.03    3.14 
> system.time(
+     mem_get_power(community = "ag",
+                   pars = c("T2M_MAX"),
+                   temporal_api = "daily",
+                   lonlat = c(-95.358421, 29.749907),
+                   date = c("2001-01-01", "2021-12-31"))
+ )
   user  system elapsed 
   0.01    0.00    0.02 
> df <- mem_get_power(community = "ag",
+                     pars = c("T2M_MAX"),
+                     temporal_api = "daily",
+                     lonlat = c(-95.358421, 29.749907),
+                     date = c("2001-01-01", "2021-12-31"))

> df
NASA/POWER CERES/MERRA2 Native Resolution Daily Data  
 Dates (month/day/year): 01/01/2001 through 12/31/2021  
 Location: Latitude  29.7499   Longitude -95.3584  
 Elevation from MERRA-2: Average for 0.5 x 0.625 degree lat/lon region = 24.97 meters 
 Value for missing model data cannot be computed or out of model availability range: NA  
 Parameter(s):  
 
 Parameters: 
 T2M_MAX     MERRA-2 Temperature at 2 Meters Maximum (C)  
 
# A tibble: 7,670 x 8
     LON   LAT  YEAR    MM    DD   DOY YYYYMMDD   T2M_MAX
   <dbl> <dbl> <dbl> <int> <int> <int> <date>       <dbl>
 1 -95.4  29.7  2001     1     1     1 2001-01-01    7.31
 2 -95.4  29.7  2001     1     2     2 2001-01-02    4.65
 3 -95.4  29.7  2001     1     3     3 2001-01-03    4.95
 4 -95.4  29.7  2001     1     4     4 2001-01-04   15.3 
 5 -95.4  29.7  2001     1     5     5 2001-01-05   18.5 
 6 -95.4  29.7  2001     1     6     6 2001-01-06   20.3 
 7 -95.4  29.7  2001     1     7     7 2001-01-07   20.2 
 8 -95.4  29.7  2001     1     8     8 2001-01-08   13.3 
 9 -95.4  29.7  2001     1     9     9 2001-01-09   15.1 
10 -95.4  29.7  2001     1    10    10 2001-01-10   14.6 
# ... with 7,660 more rows

I don't see any reason to add my feature request in that case! I'll go ahead and close. Thanks for talking that through.

mps9506 closed this as completed Feb 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching requests #68

Caching requests #68

mps9506 commented Feb 3, 2022

adamhsparks commented Feb 3, 2022 •

edited

Loading

mps9506 commented Feb 4, 2022

adamhsparks commented Feb 4, 2022

mps9506 commented Feb 4, 2022

Caching requests #68

Caching requests #68

Comments

mps9506 commented Feb 3, 2022

adamhsparks commented Feb 3, 2022 • edited Loading

mps9506 commented Feb 4, 2022

adamhsparks commented Feb 4, 2022

mps9506 commented Feb 4, 2022

adamhsparks commented Feb 3, 2022 •

edited

Loading