-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS Data Clients #3000
base: main
Are you sure you want to change the base?
AWS Data Clients #3000
Conversation
This is still very much todo with a need for tests and docstrings, but I want to see if anyone has API/interface thoughts. Such as: is the class-instance based interface confusing? Naming? How's the use of the Todo:
Ping @kgoebber @deeplycloudy |
def build_key(self, site, prod_id, dt, depth=None): | ||
parts = [site, prod_id, f'{dt:%Y}', f'{dt:%m}', f'{dt:%d}', f'{dt:%H}', f'{dt:%M}', | ||
f'{dt:%S}'] | ||
return self.delimiter.join(parts[slice(0, depth)]) |
Check failure
Code scanning / CodeQL
Unhashable object hashed
prefixes = list(itertools.chain(*(self.common_prefixes(b) for b in bounding_keys))) | ||
loc = bisect.bisect_left(prefixes, search_key) | ||
rng = slice(loc - 1, loc + 1) if loc else slice(0, 1) | ||
bounding_keys = prefixes[rng] |
Check failure
Code scanning / CodeQL
Unhashable object hashed
|
||
def build_key(self, site, dt, depth=None): | ||
parts = [f'{dt:%Y}', f'{dt:%m}', f'{dt:%d}', site, f'{site}{dt:%Y%m%d_%H%M%S}'] | ||
return self.delimiter.join(parts[slice(0, depth)]) |
Check failure
Code scanning / CodeQL
Unhashable object hashed
|
||
def build_key(self, product, dt, depth=None): | ||
parts = [product, f'{dt:%Y}', f'{dt:%j}', f'{dt:%H}', f'OR_{product}'] | ||
return self.delimiter.join(parts[slice(0, depth)]) |
Check failure
Code scanning / CodeQL
Unhashable object hashed
(e.g. latitude, longitude, altitude) from various sources. | ||
""" | ||
|
||
from .aws import * # noqa: F403 |
Check notice
Code scanning / CodeQL
'import *' may pollute namespace
I'll add my 2 cents based on my experience with goes2go...apologies for the long comment. For comparison, the from goes2go import GOES
G = GOES(satellite=16, product="ABI-L2-MCMIP", domain='C')
# each of these downloads then reads the data with xarray
G.nearesttime('2022-01-01 6:15')
G.latest()
G.timerange(start='2022-06-01 00:00', end='2022-06-01 01:00')
G.timerange(recent='30min') Main commentEven though the GOES data is in different buckets for each satellite, it's practically a single data type (like how NEXRAD is made of different sites, GOES is just different satellites). Instead of from metpy.remote import GOES16Archive, GOES17Archive, GOES18Archive I would prefer a single import and then specify the satellite in my request. Something like... from metpy.remote import GOESArchive
GOESArchive(satellite=16) This would make it easier for a user to change the satellite they want without adding/changing an import. Minor commentsOne feature of From an easy-of-use perspective, I find it easier to write code when date inputs for an API like this can optionally be given as a datetime string (I've always used pandas.to_datetime to parse these strings because they can be formatted in different ways; maybe there are other ways). # This is easier to write and read...
GOESArchive().get_product(dt="2021-01-01 06:00")
GOESArchive().get_product(dt="20210101T06")
# than this...
GOESArchive().get_product(dt=datetime(2021,1,1,6,0)) Not all of it is pretty, but I'd be happy to share other aspects of |
@blaylockbk Thanks for the feedback! The point about using a single GOES client with different satellites is a really good suggestion, thanks. I'd love to hear more considerations based your experience with goes2go. I'm certainly staring at your top comment with the API thinking over the strengths and weaknesses of the approach I took in comparison. I'm mixed on the idea of accepting strings for date/time; on one hand, it does seem to make it easy; on the other, it seems really weird to couple this code to pandas functionality when it otherwise isn't using Pandas at all. I don't think specifying one particular string format would be nearly as handy, though. (Given that Pandas is already a MetPy dependency, this is probably something I just have to get over.) Are there some use cases for supporting the string input that aren't direct user input? (i.e. pulling from another source) |
@dopplershift, yeah that's just a personal preference because I'm lazy. Anytime I write a function with date or datetime input I convert it using pandas (about 80% of the time I already imported pandas for something else); probably not the best practice for everyone... import pandas as pd
from datetime import datetime, timedelta
def my_function(date, delta):
date = pd.to_datetime(date)
delta = pd.to_timedelta(delta)
return date + delta
a = my_function("2023-01-01 06:00", "3H")
b = my_function(datetime(2023,1,1,6,0), timedelta(hours=3)) It's mainly a convenience for direct user input (notebooks), but often I have a list of ISO dates I need to loop over. Not a problem if you don't like it. |
Another "feature" would be allowing an alias "east" or "west" that switches to the appropriate satellite depending on which was operational for the date requested. GOESArchive(satellite="west") # could be 17 or 18, depending on the date requested |
Eh, it doesn't have to match my personal preferences necessarily. It's about the engineering trade-offs. It's entirely possible the complexity/coupling is worth it to yield a better user experience. That's why I'm trying to figure out what the concrete benefits are. |
start_time = key.split('_')[-3] | ||
return datetime.strptime(start_time[:-1], 's%Y%j%H%M%S') | ||
|
||
def get_product(self, product, dt, mode=None, channel=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the kwarg should be "band" instead of "channel". To my knowledge the word "channel" was since the 90s more and more replaced with the synonymous word "band" e.g. AVHRR has "channels" but MODIS, VIIRS have "bands". Sometimes the word "channel" is still used today in official documents when talking more about the hardware side of the instruments. The GOES-R SERIES PRODUCT DEFINITION AND USERS’ GUIDE with 726 pages yields 25 search results for "channel" but several hundreds for "band". Oddly the GOES-ABI L2 filenames use C13, "C" for Channel though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad I'm not the only one scratching my head over when to say "band" or "channel" 😂
It does seem "band" is the preferred term in the GOES NetCDF files; some examples...
band_id_C01:long_name = "ABI channel 1" ;
band_id_C01:standard_name = "sensor_band_identifier" ;
band_wavelength_C01 = 0.47
band_wavelength_C01:long_name = "ABI band 1 central wavelength" ;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the "C" prefix for band/channel is the only inconsistency from the GOES side with regards to what we call it.
I'm happy to just use "band", but I think that's why "channel" comes to my mind first.
The file object will be left at the end of the file after reading, leading to other things that use it (like parse) trying to read from EOF.
d9e4261
to
3d3ec2e
Compare
To test this functionality I modified python-training#136 to plot GLM data. See my comment there for what that looks like. The Some thoughts on aspects of data use unique to GLM:
|
I had occasion to think about model data, which is also now increasingly on S3. I wanted to document that here for further rumination about API design. The docs for the GEFS at the link above are somewhat out of date. For yesterday's data, There are also many file types, but I needed geopotential height, which is in the "popular variables" file type. For one time for one member, the key looks like: Below is code for downloading and concatenating all the ensemble members for one time. It shows the parameters that need to templated for this (admittedly narrow) use case.
All the members for one time can then be concatenated with
|
Description Of Changes
This adds some clients that make it possible to request a range of products, or the single closest product, to a given time/time range, based on some other product id/sites. This applies to the
noaa-nexrad-level2
,unidata-nexrad-level3
, andnoaa-goes1[6-8]
S3 buckets.This is the first PR in recent times that adds remote data access capabilities to MetPy (again).
Checklist