-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xarray.Dataset should NOT be recognized as a valid tabular data type #3146
Comments
For context, allowing For example, with ICESat-2 ATL06 altimetry data, the data is stored as a HDF5 file, with 1D data variables like import xarray as xr
# https://n5eil01u.ecs.nsidc.org/ATLAS/ATL06.006/2018.10.14/ATL06_20181014001049_02350102_006_02.h5
ds: xr.Dataset = xr.open_dataset(
"ATL06_20181014001049_02350102_006_02.h5", group="gt2l/land_ice_segments"
)
print(ds) produces an
I can make an along-track plot with PyGMT directly by passing in the import pygmt
ds_height = ds[["segment_id", "h_li"]] # along-track index and height
fig = pygmt.Figure()
fig.plot(
data=ds_height, frame=["xaf+lsegment_id", "yaf+lHeight (m)"], style="c0.05c"
)
fig.show() That said, I can see your point that we currently don't check whether the
This sounds similar to the issue mentioned at #3086 where |
In your example, it's unclear to me (from a user's perspective) what data are passed to GMT. Users may expect to pass three columns ( |
PyGMT is only able to read from an We could theoretically apply |
I didn't mean we should pass the coordinates to GMT. I feel the behavior of passing an |
Maybe we can add a section for |
OK. Also need to update the codes to check if the dataset only has 1-D variables. |
Currently,
xarray.Dataset
is recognized as a valid tabular data type in some places. For example:pygmt/pygmt/helpers/utils.py
Lines 108 to 109 in dbbc168
pygmt/pygmt/clib/session.py
Lines 1531 to 1547 in dbbc168
But I think it should NOT be like that. Here are the reasons.
1.
xarray.Dataset
is more like a collection of xarray.DataArrays, rather than a pandas.DataFrame:As the official docs says:
xarray.Dataset
can represent tabular data, but it's more commonly used as a data structure to hold multiplexarray.DataArray
objects.2. It's unclear what/how data are passed.
Here is an example from the official documentation:
Each data variable is a
xarray.DataArray
object:Then, in
virtualfile_in
, a list of multi-dimensional (3-D in this example)xarray.DataArray
objects are passed to GMT modules which expects a list of 1-D arrays instead. It works without errors because inSession.put_vector
, we pass the pointer of the 2-D array to the GMT C API function, but it likely won't work if the data is not C-contiguous (e.g., a slice of a dataset). So, the actual behavior is not well defined.So, I think
xarray.Dataset
should not be recognized as a valid tabular data type, which not only makes more sense but also can simplify our codes/tests.The text was updated successfully, but these errors were encountered: