Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the _load_remote_dataset function to load tiled and non-tiled grids in a consistent way #3120

Merged
merged 4 commits into from
Apr 22, 2024

Conversation

seisman
Copy link
Member

@seisman seisman commented Mar 18, 2024

Description of proposed changes

The _load_remote_dataset function uses two different ways when loading tiled and non-tiled remote datasets.

  • For non-tiled grids, it calls which to download the grid, get the full path and load it by calling load_dataarray (i.e., xr.load_dataarray)
  • For tiled grids, it calls grdcut, which downloads the needed tiles, assembles them into a single grid and loads it by calling load_dataarray.

Currently, both tiled and non-tiled grids are loaded from a grid file on disk by xr.load_dataarray, so the returned xarray.DataArray are similar for tiled and non-tiled grids. The only difference is, the source file given in grid.encoding["source"] doesn't exist for tiled grids, because the assembled grid is a temporary file.

We're working on refactoring the grdcut wrapper to use virtual files in #3115. After #3115 is finished, the returned xarray.DataArray would have different attributes for tiled and non-tiled grids. So, it's better to have a consistent way for loading tiled and non-tiled grids. This can be done by calling the special read module. Its syntax looks like (the special read module is only available for external wrappers, so the command doesn't work in bash):

gmt read @earth_relief_01d_g vfile -Tg -R0/10/0/10

in which, vfile can be an actual file name or a virtual file name, and -Tg means the input is a grid.

@seisman seisman force-pushed the datatypes/gmtgrid branch 4 times, most recently from d0517e0 to 4ba67df Compare April 1, 2024 11:32
@seisman seisman changed the base branch from datatypes/gmtgrid to main April 1, 2024 11:32
@seisman seisman force-pushed the refactor/load_remote_dataset branch 8 times, most recently from 278f21e to d18fc1d Compare April 19, 2024 04:53
@seisman seisman force-pushed the refactor/load_remote_dataset branch from d18fc1d to 004f985 Compare April 19, 2024 05:26
@seisman seisman changed the title POC: Refactor load_remote_dataset using the special read module Refactor the private _load_remote_dataset function by calling the 'read' module Apr 19, 2024
@seisman seisman marked this pull request as ready for review April 19, 2024 07:13
@seisman seisman changed the title Refactor the private _load_remote_dataset function by calling the 'read' module Refactor the _load_remote_dataset function to load tiled and non-tiled grids in a consistent way Apr 19, 2024
@seisman seisman added enhancement Improving an existing feature needs review This PR has higher priority and needs review. labels Apr 19, 2024
@seisman seisman added this to the 0.12.0 milestone Apr 19, 2024
Comment on lines +430 to +434
# Full path to the grid if not tiled grids.
source = which(fname, download="a") if not resinfo.tiled else None
# Manually add source to xarray.DataArray encoding to make the GMT accessors work.
if source:
grid.encoding["source"] = source
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the GMT accessors require the grid.encoding["source"] to work. So, for non-tiled gris, we get the full path to the grid and set grid.encoding["source"]. For tiled grids, grid.encoding["source"] is undefined.

if dataset.resolutions[resolution].tiled:
raise GMTInvalidInput(
f"'region' is required for {dataset.title} resolution '{resolution}'."
fname = f"@{dataset_prefix}{resolution}_{registration[0]}"
Copy link
Member Author

@seisman seisman Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also propose to

  • Rename dataset_name to name
  • Rename dataset_prefix to prefix
  • Remove the trailing _ from dataset_prefix, e.g., dataset_prefix="earth_relief_" should be prefix="earth_relief".

Please leave your comments before I make the changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it's better to do it in a separate PR to make this PR small for review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue opened at #3190.

@seisman seisman requested a review from a team April 22, 2024 02:09
f"'region' is required for {dataset.title} resolution '{resolution}'."
)

# Currently, only grids are supported. Will support images in the future.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that: currently we use -Tg so it only works for grids. For images, we need to use -Ti.

To support images, we need to know whether the remote dataset is a grid or an image. It means we need to add the data kind information to the GMTRemoteDataset class.

@seisman seisman added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Apr 22, 2024
@seisman seisman merged commit 8b2a74c into main Apr 22, 2024
19 checks passed
@seisman seisman deleted the refactor/load_remote_dataset branch April 22, 2024 23:44
@seisman seisman removed the final review call This PR requires final review and approval from a second reviewer label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants