Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MiRS reader #1511

Merged
merged 33 commits into from
Mar 9, 2021
Merged

Add MiRS reader #1511

merged 33 commits into from
Mar 9, 2021

Conversation

joleenf
Copy link
Contributor

@joleenf joleenf commented Jan 22, 2021

This PR is a replacement for PRs #1486 and #1285. This MiRS reader loads the level 2 EDR IMG swath files produced by the Microwave Integrated Retrieval System.

  • The location of coefficient files still needs to be determined and code needs to be updated as appropriate.

@ghost
Copy link

ghost commented Jan 22, 2021

DeepCode's analysis on #e2bdd8 found:

  • ⚠️ 1 warning, ℹ️ 2 minor issues. 👇

Top issues

Description Example fixes
Missing close for open, add close or use a with block. Occurrences: 🔧 Example fixes
The call to next should be guarded with a try/except block Occurrences: 🔧 Example fixes
Use predictable random with seed or secure random. Occurrences: 🔧 Example fixes

👉 View analysis in DeepCode’s Dashboard | Configure the bot

@codecov
Copy link

codecov bot commented Jan 22, 2021

Codecov Report

Merging #1511 (db8bcef) into master (e344613) will increase coverage by 0.13%.
The diff coverage is 86.85%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1511      +/-   ##
==========================================
+ Coverage   92.59%   92.73%   +0.13%     
==========================================
  Files         251      253       +2     
  Lines       36997    37576     +579     
==========================================
+ Hits        34258    34845     +587     
+ Misses       2739     2731       -8     
Flag Coverage Δ
behaviourtests 4.42% <2.09%> (-0.05%) ⬇️
unittests 92.87% <86.85%> (+0.13%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
satpy/composites/crefl_utils.py 84.52% <ø> (ø)
satpy/composites/viirs.py 86.54% <0.00%> (ø)
satpy/readers/ghrsst_l3c_sst.py 7.24% <0.00%> (ø)
satpy/readers/hrpt.py 98.62% <ø> (ø)
satpy/readers/msi_safe.py 2.29% <ø> (ø)
satpy/readers/seviri_base.py 95.18% <ø> (ø)
satpy/readers/seviri_l1b_native_hdr.py 100.00% <ø> (ø)
satpy/readers/xmlformat.py 89.65% <ø> (ø)
satpy/tests/reader_tests/test_iasi_l2_so2_bufr.py 97.18% <ø> (ø)
satpy/tests/test_modifiers.py 100.00% <ø> (ø)
... and 29 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e344613...e2bdd8b. Read the comment docs.

@djhoese djhoese changed the title Add files for MiRS reader takes over for #1285 PR and replaces #1486 PR Add MiRS reader Jan 22, 2021
@djhoese
Copy link
Member

djhoese commented Jan 22, 2021

I've updated the title just because we use it in our release notes so didn't want to clutter it with the PR numbers.

@djhoese djhoese mentioned this pull request Jan 26, 2021
4 tasks
Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple requests (mostly style stuff), but otherwise it is looking pretty good. Also 👍 for the pytest parametrize usage.

satpy/readers/mirs.py Outdated Show resolved Hide resolved
satpy/readers/mirs.py Outdated Show resolved Hide resolved
satpy/readers/mirs.py Outdated Show resolved Hide resolved
satpy/readers/mirs.py Show resolved Hide resolved
satpy/readers/mirs.py Outdated Show resolved Hide resolved
satpy/readers/mirs.py Outdated Show resolved Hide resolved
satpy/readers/mirs.py Outdated Show resolved Hide resolved
satpy/readers/mirs.py Outdated Show resolved Hide resolved
satpy/readers/mirs.py Outdated Show resolved Hide resolved
1.)  Do not need the file registry before running pooch retrieve command
2.)  Simplify/make easier to read code that does the pooch retrieve
     and call when needed, not at the beginning of the code. The filenames
     do not need to be in the metadata for the scene.
3.)  Use dask.where rather than roundabout numpy logic.  This eliminates
     all squeeze calls.
4.)  removes bt_data array slice where it is not needed but does use the
     bt_data attributes for the xarray of the corrected brightness temperatures
satpy/readers/mirs.py Outdated Show resolved Hide resolved
limb_correct_bt does not need to be in the class.  Move
code out of class methods and restructure the call to the
limb_correction to facilitate this move.
Assigning a map_blocks call to a numpy array 96 times
causes dask to compute before it is necessary.  Also, I could be wrong,
by it does not seem necessary to loop over map_blocks.  It makes more
sense to loop over the fov in the function that map_blocks calls.
bt_corrected = xr.DataArray(new_data, dims=bt_data[idx, :, :].dims,
coords=bt_data[idx, :, :].coords,
bt_corrected = xr.DataArray(new_data, dims=("y", "x"),
coords=surf_type_mask.coords,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the coords this variable has?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this and am not sure the best way to add the coords. I was considering adjusting the way the method new_coords works at line 211 to make this a more obvious way of assigning longitude and latitude coordinates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually was thinking about that same thing for the chunk size for the map_blocks. If I added a general self.nc.shape so that I can access the scanlines/fov without slicing the bt_data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's talk about it at the meeting today BUT you shouldn't need to add lon/lat/x/y coordinates. The base reader class will do that for you based on the coordinates: defined in the YAML (or your available_datasets method).

1.)  Remove extra check for self.nc.coords and just check within new_coords
     method
2.)  Change all dask calls in apply_limb_correction method to
     np calls.
This feature of xarray is not working well with dask
pydata/xarray#3068
1.)  Removes the new_coords method which added lat/lon coords
     through xarray.Dataset.assign_coords method.
2.)  Removes coordinate assignment when bt_corrected xarray.DataArrays
     are constructed.
3.)  Restructures available_datasets:  Hope, it is easier to understand
     and ensures that lat/lon coordinates have standard_names even
     if file metdata without it takes precedence after the yaml.
   otherwise they are dim_0 and dim_1 and they are not recognized by
   the reader.  Scanline/FOV are converted to y,x in reader
@joleenf joleenf mentioned this pull request Feb 27, 2021
5 tasks
@joleenf joleenf marked this pull request as ready for review February 27, 2021 13:24
@djhoese
Copy link
Member

djhoese commented Feb 27, 2021

@djhoese In this commit, the file metadata is taking precedence over metadata in the yaml for the metop file used for testing. (IMG_SX.M2.D17037.S1601.E1607.B0000001.WE.HR.ORB.nc). Therefore, standard_name is added when reading the file metadata. I thought this line in the available_datasets meant that it is possible to append new information to the metadata:

This method is not guaranteed that it will be called before any other file type’s handler. The availability “boolean” not being None does not mean that a file handler called later can’t provide an additional dataset, but it must provide more identifying (DataID) information to do so and should yield its new dataset in addition to the previous one.

Can the boolean be True, False or None? Or is it only either True or None? I think the available_dataset() information above is saying, "Even if the availability "boolean" is True rather than None, a file handler could provide additional datasets." After that line, what is the more identifying DataID? Is this referring to the more definitive DataIDs like Brightness Temperature data defined by frequency and polarization "btemp_23v" versus the wider scope of BT? If so, then that is working correctly in this reader, because both the BT and the definitive brightness temperature channels are being yielded. Is there a way to keep variable metadata from a yaml and use metadata from a file, or is it either/or?
Also, in "Migrating to Xarray" the example shows using the "coord" attribute. Is assign_coords the only concern in this issue report?

I think it can be True/False/None. I'm not sure I know or remember exactly what you're talking about and am missing some of the context of this comment/discussion. I think you have it right though. The available_datasets method is not supposed to stop/limit/remove datasets that have been "reported" by other file handlers. There is also no order to which file handler's available_datasets is called first so that why it has to be very passive.

Regarding coords, it isn't how they are assigned, it is what is being assigned (a 2D dask array). Until we can get more confidence that xarray won't "accidentally" compute these, it would be best not to include them as .coords.

@joleenf
Copy link
Contributor Author

joleenf commented Feb 27, 2021

I am sorry Dave, I meant to put a link for the available_datasets section and did not realize I hadn't done that. Sorry for the confusion. I am pretty sure that any file metadata in the mirs reader supersedes metadata read from the yaml. Perhaps I have missed a basic concept in building the datasets through the configuration file and dynamic datasets.

@djhoese
Copy link
Member

djhoese commented Feb 27, 2021

I think you based this reader off of the GAASP reader which probably makes some assumptions about where dataset definitions are coming from. I think you are right, but technically the behavior is probably "undefined" when two datasets are yielded with the same DataID. I think you'll either need to keep a "cache" of what DataIDs have already been yielded by the file handler and not re-yield/override it. I think I've used a list/set called _handled in other reader's file handlers. Or pass each sub-generator (the 3 methods you call and yield from in your available_datasets) to the next so you can do more checks but it may be more efficient to move the initial for loop outside these generators.

Obviously, making it work first is more important.

@joleenf
Copy link
Contributor Author

joleenf commented Mar 1, 2021

I think you based this reader off of the GAASP reader

Correct, I based this on the GAASP reader and it does make assumptions about the location of Metdata, mainly, I don't think it expects any of the metadata to be coming from the yaml. Currently, that is also how the mirs reader is working. Initially, I thought I would use the yaml to add metadata that was missing, but then I realized, that could be a tricky thing. I would then want to make sure that the file metadata would always override metadata from the yaml like units. I am comfortable with the idea that all the metadata comes from the file and adding things like long_name and standard_name with in the reader.
I have yet another question about the xarray coords attribute. If I am using open_dataset, doesn't this assign the coords to the xarray.DataArray just as if I had used assign_coords or even built an xarray.DataArray using a coords attribute? So, we are not clear from the the unexpected computation issue?

@djhoese
Copy link
Member

djhoese commented Mar 1, 2021

Yes, it should be adding coords as it (xarray) sees fit. You could .reset_coords if you want to force the removal of them. 🤔 Just leave the coords as they are then. We can't know what issues they will cause unless we keep them there so maybe keeping them there is the best choice here.

bt_data = bt_data.transpose("Channel", "y", "x")
c_size = bt_data[idx, :, :].chunks
correction = da.map_blocks(apply_atms_limb_correction,
bt_data.values, idx,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .values should be removed here, right?

The necessary metadata to get metop files read into satpy
is getting added in the reader.  The information provided to
the yaml would be nice to supplement information in the file, but
I can't think of a reason why it should override information, except
for descriptions, which can be handled elsewhere.  Since trying to
juggle yaml/file metadata would be a little more involved in the code,
I feel it is best to keep the model of the gaasp reader which assumes
the necessary metadata is in the file and add missing metadata within the
reader.
- add the mocks
- create fake coefficient data in test
- updates how the coefficients are read in mirs by splitting our the
  reading of the file and the actual parsing.
- Checks the end of coefficient file for n_chn and n_fov thus removing
  the need for globals
- Remove the global N_FOV in the loop which calculates the coefficients
  on the data.  Use dataset size for loop endpoint, if the dataset
  for some reason does not have 96 fov, this loop would fail with a
  fixed endpoint.
Copy link
Member

@djhoese djhoese left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good enough to me. Thanks for the patience on getting things in order and waiting on the aux data download stuff.

@djhoese djhoese merged commit f56a089 into pytroll:master Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants