-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add "viirs_edr" reader #2531
Add "viirs_edr" reader #2531
Conversation
for more information, see https://pre-commit.ci
Codecov Report
@@ Coverage Diff @@
## main #2531 +/- ##
========================================
Coverage 94.87% 94.87%
========================================
Files 349 349
Lines 50855 51155 +300
========================================
+ Hits 48248 48534 +286
- Misses 2607 2621 +14
Flags with carried forward coverage won't be shown. Click here to find out more.
|
I prefer uppercase band names with leading zeros ( Regarding the choice of name, I chose Seems fine to call the whole lot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as-is.
do I change all these hardcoded-in-YAML variables to be dynamically discovered so the reader can basically load anything?
The reader YAML is pretty long, so I'm not against it. The only downside I can think if is that to see what is supported (and by which name) one would need to first have the data and use available_dataset_names()
.
Yeah that is true, but I think that comes with the territory of a generic reader that reads generic files (NOAA's enterprise algorithm output). @simonrp84 thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor comments. Personally, I'd like the masking of invalid data to be optional rather than hardcoded.
My comment about the number of columns is due to bitter experience of being hit by granules varying from the "typical" number of columns in the past!
def get_dataset(self, dataset_id: DataID, info: dict) -> xr.DataArray: | ||
"""Get the dataset.""" | ||
data_arr = self.nc[info['file_key']] | ||
data_arr = self._mask_invalid(data_arr, info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be made optional? Some users may wish to access the unmasked data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the NDVI/EVI special case with QF checks, I think it could be optional since that's very complex and opinionated. Otherwise, I'm not sure I agree. This is checking fill values and valid ranges. Why would a user need to have that? The L2 products don't encode anything with different fill values as far as I know...I think, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I meant to comment on the other _mask_invalid
, the one for the NDVI/EVI. If we could make that one optional then that'd be great. This one, that I actually comment on, is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done now. I had a very non-creative moment and went with filter_veg
as the kwarg. Other ideas?
@pnuu I added your
What do you think? |
Go with option 1. I tried the reader, and currently I get this error: glbl.load(['CloudMaskBinary'])
*** KeyError: "No dataset matching 'DataQuery(name='longitude_jrr_cloudmask', resolution=750)' found" Looking at the files I have, the coordinate variables are |
I did detect one bug late last night that I'll work on this morning. I'll try to find the same files you're using and figure out the issues. |
Oh FYI @pnuu that error message still makes sense. The variables for CloudMaskBinary actually makes sense because I name the individual lon/lat datasets based on the file type they are coming from. I decided to change how Simon had done it in the original version of this reader where it assumed you were only ever loading one file type at a time and that all lon/lats were equivalent across files. So it was doing "oh we found longitude in this file type, use that for all longitude requests", but now it always loads lon/lat from the file for a particular dataset. I'm still debugging things... |
@pnuu Try now. Recent commits should fix a couple important issues and have removed your readers python and YAML. |
Thanks, seems to be working as before (after reader and channel name adjustments, ofc), good job! |
Great. I think after I fix a few more bugs I've encountered this should be ready to merge. |
This is a replacement of #1972, #1483, and may be a replacement for the
viirs_l2_cloud_mask_nc
reader added in #2316. Basically I need to be able to read the NOAA Enterprise EDR files and all of these PRs share a little bit of logic but were never completed. My priority right now is the surface reflectance products and the vegetation indices. There are cloud top temp and height products that I may add here or in a separate PR.The existing readers or PRs for readers all use the "L2" name. These are "EDRs". Functionally the same, but we should be consistent with the majority of NOAA documentation (which admittedly also occasionally refers to them as L2). There are also existing readers that I had added in the past called "viirs_edr_X". Given the nature of these products, I think this new reader can be called a generic
viirs_edr
name (no suffix). I haven't looked into most of the files I hope to support, but my hope is that the enterprise algorithms are relatively similar in file structure. Even if they aren't, they are NetCDF files and represent the same level of product for the same instrument. Anyone object?Side note: Kathy Strabala and I thought maybe the surface reflectance products could be called "srf_i01" and so on. Reflecting that they are different than the SDR bands (TOA versus surface), but also being consistent with the "i01" style naming rather than the "i1" naming that the existing PRs are using. I could also see doing "I01" (uppercase letter) to be consistent with the
viirs_sdr
reader, but then I'd have to uppercase SRF which I don't really want to do.I'll have to decide later whether or not it is appropriate to deprecate the
viirs_l2_cloud_mask_nc
reader in favor of this one.Opinions? Please comment.
CC @pnuu @simonrp84 @nbearson
Edit: I see @simonrp84 used
surf_refl_I01
for his surface reflectance products. Maybe I'll stick with that. Again, opinions welcome.