Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remote Datasets: Inconsistent dataset title, name and long_name #2914

Closed
seisman opened this issue Dec 26, 2023 · 4 comments · Fixed by #3048
Closed

Remote Datasets: Inconsistent dataset title, name and long_name #2914

seisman opened this issue Dec 26, 2023 · 4 comments · Fixed by #3048
Labels
maintenance Boring but important stuff for the core devs
Milestone

Comments

@seisman
Copy link
Member

seisman commented Dec 26, 2023

A simple script to produce the table below:

from pygmt.datasets.load_remote_dataset import datasets
print("| dataset | title | name | long_name |")
print("|---|---|---|---|")
for name, dataset in datasets.items():
    print("|", " | ".join([name, dataset.title, dataset.name, dataset.long_name]), "|")
dataset title name long_name
earth_age seafloor age seafloor_age age of seafloor crust
earth_free_air_anomaly free air anomaly free_air_anomaly IGPP Earth Free-Air Anomaly
earth_geoid Earth geoid earth_geoid EGM2008 Earth Geoid
earth_magnetic_anomaly Earth magnetic anomaly magnetic_anomaly Earth magnetic anomaly
earth_mask Earth mask earth_mask Mask of land and water features
earth_relief Earth relief elevation Earth elevation relative to the geoid
earth_vgg Earth vertical gravity gradient earth_vgg IGPP Earth Vertical Gravity Gradient
earth_wdmam WDMAM magnetic anomaly wdmam World Digital Magnetic Anomaly Map
@seisman seisman added the maintenance Boring but important stuff for the core devs label Jan 11, 2024
@seisman seisman added this to the 0.11.0 milestone Jan 11, 2024
@yvonnefroehlich
Copy link
Member

yvonnefroehlich commented Jan 20, 2024

The table below shows what I tried until now (I just realized that dataset and name are now identical):

  • Maybe we should also consider the load_earth* functions? I just added them as the new first column to the table.

  • The magnetic anomaly is difficult, as there are the two separate datasets emag2 and wdmam. I moved these two rows to the bottom of the table.

  • Probably we want to avoid renaming dataset, or? For the datasets, I am debating about a renaming of dataset I wrote the new version in italic style.

  • Start always with "Earth"

    • Useful for clarity as we will have dataset from the moon and other planets
    • Redundant for the datasets emag2 and wdmam
  • Some general aspects:

    • Usage of shortcuts (short but unclear for "non-experts" vs. too long)
    • Usage of upper- or lower-case letters
    • Usage of underscores for name
load_earth_* dataset title name long_name
age earth_age Earth seafloor crustal age earth_age EarthByte Earth seafloor crustal age
free_air_anomaly earth_faa Earth free-air anomaly earth_faa IGPP Earth free-air anomaly
geoid earth_geoid Earth geoid earth_geoid EGM2008 Earth geoid
mask earth_mask Earth mask earth_mask GSHHG Earth mask of land and water features
relief earth_relief Earth relief earth_relief IGPP and GEBCO Earth reliefs relative to the geoid
vertical_gravity_gradient earth_vgg Earth vertical gravity gradient earth_vgg IGPP Earth vertical gravity gradient
magnetic_anomaly earth_emag or emag Earth magnetic anomaly model earth_emag or emag Earth magnetic anomaly model at 2 arc-minutes resolution (EMAG2)
magnetic_anomaly earth_wdmam or wdmam World digital magnetic anomaly map earth_wdmam or wdmam World digital magnetic anomaly map (WDMAM)

@seisman
Copy link
Member Author

seisman commented Jan 22, 2024

  • Probably we want to avoid renaming dataset, or? For the datasets, I am debating about a renaming of dataset I wrote the new version in italic style.

I agree. The dataset key is used internally by maintainers, so it's OK to rename them. For us maintainers, earth_faa makes more sense than earth_free_air_anomaly.

  • Some general aspects:

    • Usage of shortcuts (short but unclear for "non-experts" vs. too long)
    • Usage of upper- or lower-case letters
    • Usage of underscores for name

I think we should remove the title attribute, which is only used in the error messages.

In summary, I think we should:

  • keep all the load_* API functions unchanged
  • dataset key should match the GMT remote dataset name, e.g.,earth_age
  • dataset name should match the dataset key, e.g., earth_age
  • dataset title should be removed
  • dataset long_name should match the title on the dataset documentation page (https://www.generic-mapping-tools.org/remote-datasets/), but use lower-case letter when possible e.g., EarthByte Earth seafloor crustal age, rather than EarthByte Earth Seafloor Crustal Age.

@seisman
Copy link
Member Author

seisman commented Mar 15, 2024

Continue the discussions in #3048 (comment).

>>> from pygmt import which
>>> import xarray as xr
>>> grid = xr.load_dataarray(which("@earth_relief_01d_g"))
>>> grid
<xarray.DataArray 'z' (lat: 181, lon: 361)> Size: 261kB
array([[ 2865. ,  2865. ,  2865. , ...,  2865. ,  2865. ,  2865. ],
       [ 3088. ,  3087.5,  3087. , ...,  3088.5,  3088. ,  3088. ],
       [ 3100.5,  3100.5,  3101. , ...,  3101.5,  3101. ,  3100.5],
       ...,
       [-3745.5, -3729. , -3722.5, ..., -3734. , -3742. , -3745.5],
       [-2940. , -2945. , -2951. , ..., -2895.5, -2921.5, -2940. ],
       [-3861. , -3861. , -3861. , ..., -3861. , -3861. , -3861. ]],
      dtype=float32)
Coordinates:
  * lon      (lon) float64 3kB -180.0 -179.0 -178.0 -177.0 ... 178.0 179.0 180.0
  * lat      (lat) float64 1kB -90.0 -89.0 -88.0 -87.0 ... 87.0 88.0 89.0 90.0
Attributes:
    actual_range:  [-7174.  5350.]
    long_name:     elevation (m)

>>> grid = xr.load_dataarray(which("@earth_age_01d_g"))
>>> grid
<xarray.DataArray 'z' (lat: 181, lon: 361)> Size: 261kB
array([[      nan,       nan,       nan, ...,       nan,       nan,
              nan],
       [      nan,       nan,       nan, ...,       nan,       nan,
              nan],
       [      nan,       nan,       nan, ...,       nan,       nan,
              nan],
       ...,
       [65.84    , 65.740005, 65.630005, ..., 66.100006, 65.979996,
        65.84    ],
       [61.45    , 61.77    , 62.06    , ..., 60.75    , 61.11    ,
        61.45    ],
       [55.61    , 55.61    , 55.61    , ..., 55.61    , 55.61    ,
        55.61    ]], dtype=float32)
Coordinates:
  * lon      (lon) float64 3kB -180.0 -179.0 -178.0 -177.0 ... 178.0 179.0 180.0
  * lat      (lat) float64 1kB -90.0 -89.0 -88.0 -87.0 ... 87.0 88.0 89.0 90.0
Attributes:
    actual_range:  [  0.37 336.52]
    long_name:     ages (Myr)

I feel that GMT always uses z as the grid data name and stores the actual name in long_name. Maybe we should follow what GMT does? I.e.,

  • grid.name = "z"
  • grid.long_name = "elevation (m)"
  • grid.description = "GEBCO Earth Reliefs"

@seisman seisman removed this from the 0.12.0 milestone Mar 28, 2024
@seisman
Copy link
Member Author

seisman commented Apr 19, 2024

It's time to continue the discussions here.

Here is the xarray.DataArray object returned by the current load_earth_relief function:

>>> from pygmt.datasets import load_earth_relief
>>> grid0 = load_earth_relief(resolution="01d", registration="gridline")
>>> grid0
<xarray.DataArray 'elevation' (lat: 181, lon: 361)> Size: 523kB
array([[ 2865. ,  2865. ,  2865. , ...,  2865. ,  2865. ,  2865. ],
       [ 3088. ,  3087.5,  3087. , ...,  3088.5,  3088. ,  3088. ],
       [ 3100.5,  3100.5,  3101. , ...,  3101.5,  3101. ,  3100.5],
       ...,
       [-3745.5, -3729. , -3722.5, ..., -3734. , -3742. , -3745.5],
       [-2940. , -2945. , -2951. , ..., -2895.5, -2921.5, -2940. ],
       [-3861. , -3861. , -3861. , ..., -3861. , -3861. , -3861. ]])
Coordinates:
  * lon      (lon) float64 3kB -180.0 -179.0 -178.0 -177.0 ... 178.0 179.0 180.0
  * lat      (lat) float64 1kB -90.0 -89.0 -88.0 -87.0 ... 87.0 88.0 89.0 90.0
Attributes:
    long_name:         Earth elevation relative to the geoid
    units:             meters
    vertical_datum:    EGM96
    horizontal_datum:  WGS84

Here is the xarray.DataArray object returned by calling xr.load_datarray:

>>> import xarray as xr
>>> from pygmt import which
>>> grid1 = xr.load_dataarray(which("@earth_relief_01d_g", download="a"))
>>> grid1
<xarray.DataArray 'z' (lat: 181, lon: 361)> Size: 523kB
array([[ 2865. ,  2865. ,  2865. , ...,  2865. ,  2865. ,  2865. ],
       [ 3088. ,  3087.5,  3087. , ...,  3088.5,  3088. ,  3088. ],
       [ 3100.5,  3100.5,  3101. , ...,  3101.5,  3101. ,  3100.5],
       ...,
       [-3745.5, -3729. , -3722.5, ..., -3734. , -3742. , -3745.5],
       [-2940. , -2945. , -2951. , ..., -2895.5, -2921.5, -2940. ],
       [-3861. , -3861. , -3861. , ..., -3861. , -3861. , -3861. ]])
Coordinates:
  * lon      (lon) float64 3kB -180.0 -179.0 -178.0 -177.0 ... 178.0 179.0 180.0
  * lat      (lat) float64 1kB -90.0 -89.0 -88.0 -87.0 ... 87.0 88.0 89.0 90.0
Attributes:
    actual_range:  [-7174.  5350.]
    long_name:     elevation (m)

Here is the xarray.DataArray if we use virtual files (implemented in #3120):

>>> from pygmt.clib import Session
>>> with Session() as lib:
...     with lib.virtualfile_out(kind="grid") as voutgrd:
...         lib.call_module("read", f"@earth_relief_01d_g {voutgrd} -Tg")
...         grid2 = lib.virtualfile_to_raster(kind="grid", vfname=voutgrd)
...
>>> grid2
<xarray.DataArray 'z' (lat: 181, lon: 361)> Size: 523kB
array([[ 2865. ,  2865. ,  2865. , ...,  2865. ,  2865. ,  2865. ],
       [ 3088. ,  3087.5,  3087. , ...,  3088.5,  3088. ,  3088. ],
       [ 3100.5,  3100.5,  3101. , ...,  3101.5,  3101. ,  3100.5],
       ...,
       [-3745.5, -3729. , -3722.5, ..., -3734. , -3742. , -3745.5],
       [-2940. , -2945. , -2951. , ..., -2895.5, -2921.5, -2940. ],
       [-3861. , -3861. , -3861. , ..., -3861. , -3861. , -3861. ]])
Coordinates:
  * lat      (lat) float64 1kB -90.0 -89.0 -88.0 -87.0 ... 87.0 88.0 89.0 90.0
  * lon      (lon) float64 3kB -180.0 -179.0 -178.0 -177.0 ... 178.0 179.0 180.0
Attributes:
    Conventions:   CF-1.7
    title:         SRTM15 Earth Relief v2.5.5 at 01 arc degree
    history:
    description:   Reduced by Gaussian Cartesian filtering (314.5 km fullwidt...
    long_name:     elevation (m)
    actual_range:  [-7174.  5350.]

I prefer grid2 mainly because the grid name always defaults to z for wrappers that write grids using virtual files. So it's better to make the remote datasets consistent with other wrappers.

@seisman seisman added this to the 0.12.0 milestone Apr 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
2 participants