Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data visualization with OpenAltimetry API #144

Merged
merged 49 commits into from
May 10, 2021
Merged

Conversation

icetianli
Copy link
Contributor

@icetianli icetianli commented Sep 14, 2020

Follow the original PR and the discussion with @JessicaS11 and @friedrichknuth, further improvements have been made regarding visualization by data requests from OpenAltimetry API:

  • Generalization of the code for different datasets available from OA API (https://openaltimetry.org/data/swagger-ui/). It now supports products ['ATL06','ATL07','ATL08','ATL10','ATL12','ATL13’]. ATL03 is not supported beacause OA API only supports ATL03 data request at single date with a spatial limitation of 1 degree at either lat/lon.

  • Create a visualization module to host all relevant methods related to OA request and data visualisation. Three methods of visualization are available, which are: 1) interactive plot in holoview and datashader; 2) traditional matplotlib plot for geodataframe; 3) default spatial bounding box plot. This is still at the exploration stage and can be enhanced later.

  • Solve the issue mentioned in PR on a 5-degree spatial limit in the latitude or longitude dimension set by the OpenAltimetry API. OA API can now return data with any bounding box.

  • Only request ICESat-2 data in the latest cycle from OpenAltimetry, this greatly reduces the overall time expenses. However, for a larger region with high data density, the OA request will still take several minutes, presumably related to how parameter lists are generated in the generate_OA_paras function.

  • Test added under icepyx/icepyx/test/test_OpenAltimetry.py

Binder button to test this branch: Binder

closes #92

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @icetianli, good job on this, looks like you've picked up some dask and datashader skills too! Just a quick suggestion on first user experience, which is to add a progress bar while downloading. Also one minor suggestion to reduce the bounding box size for the test area.

I'll be happy to take a closer look later and can help with adding more tests. What will be helpful is if you can run black on your code for your next commits (cc @JessicaS11, can you remind us how the black pre-commit hook from #96 works?)

icepyx/tests/test_OpenAltimetry.py Outdated Show resolved Hide resolved
icepyx/core/visualization.py Outdated Show resolved Hide resolved
Copy link
Member

@JessicaS11 JessicaS11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, @icetianli! I've got a few overarching comments for some next steps before we merge this PR:

  1. As @weiji14 suggested, it would be great to run black on this to format it. If you install pre-commit (pip install pre-commit) in your local working environment, black should be run automatically when you make a commit, reformatting the files to give the library consistent spacing and use of " vs ' throughout. Let me know if you have any issues with this.
  2. We should expand the docstrings to include input parameters and outputs.
  3. To get the new functionality to show up in our documentation, we have to explicitly add it to the docs files.
  4. Some examples to showcase this amazing functionality! This could be an addition to an existing how-to notebook or part of a standalone simple visualization, or maybe even both. My guess is you have either a script or notebook that you've been using this code in, which will make a great starting point for sharing it.

I'm happy to take on some of this work - it looks like a lot, but it's a critical piece of our efforts to make sure that your code is used by the broader community!

…ation, enhance interactive visualization based on cycle number
@icetianli
Copy link
Contributor Author

icetianli commented Mar 7, 2021

Thanks @JessicaS11 and @weiji14 for the advice! Sorry it took me so long to update! I have now improved the default spatial extent plot using geoviews, I made a pull request to implement this so it can be merged separately. For OpenAltimetry data visualization, I deprecated the functionality that only requests the lastest cycle demonstrated in my last commit, as now what I think the most valuable function of OA API is to provide a low-cost way of playing ICESat-2 data, e.g. compare geolocations and elevations between different ground tracks and cycles for a large region, without going through the trouble of downloading and plotting data separately.

Now the script can show elevation distributions from different cycles, see notebook.
image

In terms of future development, I think it can add a lat-elevation plot for individual RGT/ cycle in a similar way as OpenAltimetry since the data is now stored in xarray so it becomes quite handy to achieve this.

The improvements include:

  • Avoid sending empty requests
  • Append requested data for each 5*5 grid in a multi-dimensional xarray dataset
  • Use cycles as a control parameter to visualize elevations
  • Add colorbar to datashade plot

I only test the functions in Antarctica Peninsula where the data density is relatively low. For a large region with a long time span, OA request may take several minutes, but I think this can be further enhanced by improving the parallel processing or just setting a limit for spatial extent and date range to warn the users that it may take a long time to process.

@weiji14
Copy link
Member

weiji14 commented Mar 8, 2021

Hi @icetianli, welcome back, hope your research is going well! Just had a quick scan of the changes here and it looks like you've got made some really nice improvements, especially with that ICESat-2 cycle slider! I'll give this a full review tomorrow as it's getting late in my timezone.

In the meantime, could you merge in the changes from the development into this branch using git merge development? There's a small conflict on the requirements.txt file you'll need to resolve. Or you can do it via the Github UI by clicking on the 'Resolve conflicts' button:

image

@JessicaS11
Copy link
Member

This looks awesome! Thanks for all these updates, @icetianli, and the example is great. I haven't had a chance to do a line-by-line review, but I'm thinking it would make sense to better integrate the capabilities from #176 into this new module (we can always merge that PR into development, then use this one to migrate it all to the visualization module). Then, we can call visualize directly from the query object with options for the different methods of visualization. We'll also need to add in some checks for valid datasets specifically for this step, because as I remember OA doesn't host all datasets, so we'll need to let users know if they can't quick-view the dataset they're requesting. We can talk in more detail about some of this at Tuesday's icepyx meeting!

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more comments for now, mostly just minor typos and code style suggestions. I'll still need to walk myself through the logic a bit more and play around with your 'examples/ICESat-2_Data_Visualization_Example.ipynb' notebook (which should be listed under the README.md file by the way under https://github.com/icesat2py/icepyx/tree/v0.3.2#examples-jupyter-notebooks).

I'm thinking it would make sense to better integrate the capabilities from #176 into this new module (we can always merge that PR into development, then use this one to migrate it all to the visualization module). Then, we can call visualize directly from the query object with options for the different methods of visualization.

Agree. So the workflow would be something like Merge #176 into 'development' -> Merge 'development' into this PR #144 -> Migrate the 'visualize_spatial_extent' to 'visualization.py'.

We'll also need to add in some checks for valid datasets specifically for this step, because as I remember OA doesn't host all datasets, so we'll need to let users know if they can't quick-view the dataset they're requesting. We can talk in more detail about some of this at Tuesday's icepyx meeting!

Yep, something like an allowlist to check if the product is in ['ATL06','ATL07','ATL08','ATL10','ATL12','ATL13'] should work. I'll try and listen in for the meeting tomorrow but probably won't talk much cause it'll be 6am for me 😆

requirements.txt Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
icepyx/core/visualization.py Outdated Show resolved Hide resolved
icepyx/core/visualization.py Outdated Show resolved Hide resolved
icepyx/core/visualization.py Outdated Show resolved Hide resolved
icepyx/core/visualization.py Outdated Show resolved Hide resolved
icepyx/core/visualization.py Show resolved Hide resolved
@weiji14 weiji14 added the longer_contribution Issues that will take more than an afternoon to implement label Mar 9, 2021
icetianli and others added 5 commits March 9, 2021 10:55
- Add product check
- Increase number of concurrent multithreading max_worker to reduce API requesting time
- Use backoff to deal with specfic exceptions raised during API request
- Set elevation sampling rate of 1/20 and cycle number of latest cycle to optimize memory and speed up data processing
- Add testing to all supported products
- Add an interactive map to visualize elevation spatial distribution and along-track elevation for individual RGT
- Update visualization notebook
Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just restarting the review process again 😄 Sorry if it looks like a lot (got a bit carried away), but hopefully you can just batch commit them, or I can push the commits to this branch directly if you're ok with it. Feel free to disagree or discuss about any of the recommended changes too, you've worked on this code a lot more than I have and tbh, it's been a while since I worked on ATL06!

To continue with the rest of the review though, could you please:

  1. Merge in the latest changes from the 'development' branch into this 'openaltimetry' branch (click on the 'Resolve conflicts' button in the GitHub UI and remove the extra blank line).
  2. Clear the output of the first cell (the one with from icepyx.core.visualization import Visualize) in your ICESat-2_Data_Visualization_Example.ipynb notebook and re-commit it to GitHub. This will result in a smaller diff (the bokeh svg icon is very large for some reason), and hopefully we can comment/review on the *.ipynb file easier later.

P.S. I'm doing some work at #196 to enable showing jupyter notebooks directly as a 'Tutorial' on the icepyx ReadTheDocs page. If that work gets merged in, we'll have to move your ICESat-2_Data_Visualization_Example.ipynb to a new home under doc/, and users will be able to see your walkthrough tutorial even better! Here's a sneak peek of how it would look like:

image

Keep up the good work 🙌

icepyx/core/visualization.py Outdated Show resolved Hide resolved
icepyx/core/visualization.py Outdated Show resolved Hide resolved
icepyx/core/visualization.py Outdated Show resolved Hide resolved
icepyx/core/visualization.py Outdated Show resolved Hide resolved
icepyx/core/visualization.py Outdated Show resolved Hide resolved
icepyx/core/visualization.py Outdated Show resolved Hide resolved
icepyx/core/visualization.py Outdated Show resolved Hide resolved
icepyx/core/visualization.py Outdated Show resolved Hide resolved
icepyx/core/visualization.py Outdated Show resolved Hide resolved
icepyx/core/visualization.py Outdated Show resolved Hide resolved
@JessicaS11
Copy link
Member

This is looking so great - thanks for all your hard work, @icetianli! I managed to break your tests, which I will work to fix pending our decision about how we initialize the Visualization object. I'd also like to try and add a few more tests to target some of the other functions you've added.

Also, since I made the PR a mess by running black on some of the files, over the next couple of days I will put in a separate PR to do the reformatting, and once we merge it into development we can bring it in here to clean things up again. Sorry about that - I didn't realize how many modules had fallen behind on formatting!

@weiji14
Copy link
Member

weiji14 commented Apr 29, 2021

This is looking so great - thanks for all your hard work, @icetianli! I managed to break your tests, which I will work to fix pending our decision about how we initialize the Visualization object. I'd also like to try and add a few more tests to target some of the other functions you've added.

@JessicaS11, could you describe your idea a bit more on how the Visualization object should be initialized? I read your note on #144 (comment) and saw commit b862eca (which looks good by the way), so is this pretty much done? Or do you mean having an API like Query.visualize.elevation() and Query.visualize.spatial_extent() to replace the current Query.visualize_elevation() and Query.visualize_spatial_extent() calls (probably best left to a separate PR tbh).

Also, since I made the PR a mess by running black on some of the files, over the next couple of days I will put in a separate PR to do the reformatting, and once we merge it into development we can bring it in here to clean things up again. Sorry about that - I didn't realize how many modules had fallen behind on formatting!

Just on this, you can also revert commits 53b55c4 and 8c1cd05 to remove unrelated changes and reduce the git diff on this PR. Following that, do cherry-pick (using -n/--no-commit) on 8c1cd05 to get only the relevant changes (see https://stackoverflow.com/questions/5717026/how-to-git-cherry-pick-only-changes-to-certain-files).

git revert 53b55c42a97e681dc5ba954acee6e418a525e350
git revert 8c1cd052b77e1a8570efdcd739db3d107d14469d

git cherry-pick -n 8c1cd052b77e1a8570efdcd739db3d107d14469d
git add doc/source/user_guide/documentation/query.rst examples/examples.rst icepyx/core/visualization.py
git commit -m "add docstrings and visualization to docs; reformat with black"

@icetianli
Copy link
Contributor Author

I managed to break your tests, which I will work to fix pending our decision about how we initialize the Visualization object. I'd also like to try and add a few more tests to target some of the other functions you've added.

This change looks good to me @JessicaS11 , it's nice to be able to pass Query object directly to Visualize. So currently it's ipx.Visualize.viz_elevation() for elevation, then why still keep visualize_elevation() function in Query class? I am a bit confused about this.

@JessicaS11
Copy link
Member

So currently it's ipx.Visualize.viz_elevation() for elevation, then why still keep visualize_elevation() function in Query class? I am a bit confused about this.

Or do you mean having an API like Query.visualize.elevation() and Query.visualize.spatial_extent() to replace the current Query.visualize_elevation() and Query.visualize_spatial_extent() calls (probably best left to a separate PR tbh).

Good points, @icetianli and @weiji14. After some more thought, I think that it makes the most sense to encourage users to access the visualizations as methods on the Query object (reg_a.visualize_spatial_extent() and reg_a.visualize_elevation()), since that's when they're most likely to be using the functions. As with all the other submodules (granules.py, variables.py, etc.), these can be accessed independently, if the user so chooses, by importing the module directly from icepyx.core.visualization import Visualize). I thus took the Visualize object importation out of the __init__.py but did leave a note that you could access the module directly at the end of the visualization example notebook. This also will decrease confusion if we ultimately add any other types of visualizations for working with the data.

Thanks for asking all these great clarifying questions! Let me know if more changes are still needed.

@JessicaS11
Copy link
Member

Travis is failing on the ATL07 test_visualization_date_range. I can reproduce the error with:

from icepyx.core.visualization import Visualize
vis = Visualize(product='ATL07', spatial_extent=[-65, -66, -64.5, -65], date_range=['2019-7-1', '2019-8-1'])

And the error traceback is:

IndexError                                Traceback (most recent call last)
<ipython-input-10-19eb5e1c5abb> in <module>
----> 1 vis.parallel_request_OA().size

~/computing/icepyx/github_repo/icepyx/icepyx/core/visualization.py in parallel_request_OA(self)
    428                 total=len(parallel_OA_data),
    429             ):
--> 430                 r = future.result()
    431                 if r is not None:
    432                     requested_OA_data.append(r)

/opt/anaconda3/envs/icepyx/lib/python3.9/concurrent/futures/_base.py in result(self, timeout)
    431                 raise CancelledError()
    432             elif self._state == FINISHED:
--> 433                 return self.__get_result()
    434 
    435             self._condition.wait(timeout)

/opt/anaconda3/envs/icepyx/lib/python3.9/concurrent/futures/_base.py in __get_result(self)
    387     def __get_result(self):
    388         if self._exception:
--> 389             raise self._exception
    390         else:
    391             return self._result

/opt/anaconda3/envs/icepyx/lib/python3.9/concurrent/futures/thread.py in run(self)
     50 
     51         try:
---> 52             result = self.fn(*self.args, **self.kwargs)
     53         except BaseException as exc:
     54             self.future.set_exception(exc)

~/computing/icepyx/github_repo/icepyx/icepyx/core/visualization.py in request_OA_data(self, paras)
    342         try:
    343 
--> 344             df_series = df.query(expr="date == @Date").iloc[0]
    345             beam_data = df_series.beams
    346 

/opt/anaconda3/envs/icepyx/lib/python3.9/site-packages/pandas/core/indexing.py in __getitem__(self, key)
    893 
    894             maybe_callable = com.apply_if_callable(key, self.obj)
--> 895             return self._getitem_axis(maybe_callable, axis=axis)
    896 
    897     def _is_scalar_access(self, key: Tuple):

/opt/anaconda3/envs/icepyx/lib/python3.9/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1499 
   1500             # validate the location
-> 1501             self._validate_integer(key, axis)
   1502 
   1503             return self.obj._ixs(key, axis=axis)

/opt/anaconda3/envs/icepyx/lib/python3.9/site-packages/pandas/core/indexing.py in _validate_integer(self, key, axis)
   1442         len_axis = len(self.obj._get_axis(axis))
   1443         if key >= len_axis or key < -len_axis:
-> 1444             raise IndexError("single positional indexer is out-of-bounds")
   1445 
   1446     # -------------------------------------------------------------------

IndexError: single positional indexer is out-of-bounds

It looks like it's coming from the specification of error type @icetianli added to line 347 of visualize.py. Do we just need to add this error type, or should we be updating the test?

@icetianli
Copy link
Contributor Author

icetianli commented May 4, 2021

Ah yes! It's another error caused by empty dataframe by df.query @JessicaS11 , could just add IndexError in the error list except (NameError, KeyError, IndexError) as error:, sorry didn't catch it last time! Added in aaf852f

@icetianli
Copy link
Contributor Author

Good points, @icetianli and @weiji14. After some more thought, I think that it makes the most sense to encourage users to access the visualizations as methods on the Query object (reg_a.visualize_spatial_extent() and reg_a.visualize_elevation()), since that's when they're most likely to be using the functions. As with all the other submodules (granules.py, variables.py, etc.), these can be accessed independently, if the user so chooses, by importing the module directly from icepyx.core.visualization import Visualize). I thus took the Visualize object importation out of the __init__.py but did leave a note that you could access the module directly at the end of the visualization example notebook. This also will decrease confusion if we ultimately add any other types of visualizations for working with the data.

Thanks for adding these additions @JessicaS11! They look great, agree it's better to call different visualization methods directly from Query object as default and in the meantime keep the options open for users to import visualization, no further questions from me.

@JessicaS11
Copy link
Member

Thanks @icetianli for taking this on, and @weiji14 for all your review help. I'm going to go ahead and approve this PR, but I won't merge it in case either of you have any last edits/changes/fixes since I was the last one to push a commit. Feel free to merge (and if you're feeling really ambitious open a dev-->main PR, though we'll need to add a changelog before we merge that).

@icetianli
Copy link
Contributor Author

Thanks @JessicaS11 and @weiji14 for all the great advice! I am happy to proceed with merging this PR, is there any additional change you would like to add @weiji14?

Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this PR looks good to me so feel free to squash and merge this in! Well done on seeing this through to the end, I hope you learned some new things from it, and sorry that it took as long as a paper review!

@icetianli icetianli merged commit f2c6a5f into development May 10, 2021
@icetianli icetianli deleted the openaltimetry branch May 10, 2021 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
longer_contribution Issues that will take more than an afternoon to implement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add quick plots from OpenAltimetry
4 participants