-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pygmt.grdcut: Refactor to store output in virtualfiles for grids #3115
base: datatypes/gmtimage
Are you sure you want to change the base?
Conversation
d8f1160
to
d0517e0
Compare
d0517e0
to
4ba67df
Compare
Special case `earth_day` to be loaded as GMT_IMAGE rather than GMT_GRID. Need to use `GMT_OUT|GMT_IS_REFERENCE` in virtualfile_out to avoid segfault, xref #3115.
pygmt/clib/session.py
Outdated
with GMTTempFile(suffix=".tif") as tmpfile: | ||
self.call_module("write", f"{vfname} {tmpfile.name} -Ti") | ||
return xr.load_dataarray(tmpfile.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little confused. After #3128 is finished, we should be able to use self.read_virtualfile(vfname, kind=kind).contents.to_dataarray()
to convert the GMT_IMAGE structure into a dataarray, instead of replying on temporary files.
In other words, we should merge changes of #3128 into this branch, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original idea of this PR is to make grdcut
support both grids (via GMT_GRID) and images (via temporary files because GMT_IMAGE is not wrapped yet). I can continue this PR without waiting for #3128. Maybe it's easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pygmt/src/grdcut.py
Outdated
case "file": | ||
realpath = which(grid, download="a") | ||
if isinstance(realpath, list): | ||
realpath = realpath[0] | ||
outkind = "image" if realpath.endswith(".tif") else "grid" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a file is given, there are two different ways to determine if the file is a grid or an image:
- Based on file extensions
- Read the file (header only if possible) via
xarray
orrasterio
and check the data dims.
I guess option 2 is more robust and preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently use option 1 because it's easy for coding.
Co-authored-by: Wei Ji <[email protected]>
dtype=np.uint8, | ||
), | ||
coords={ | ||
"band": [1, 2, 3], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An off-topic question:
In [3]: import rioxarray
In [4]: rioxarray.open_rasterio("earth_day_01d_p.tif")
Out[4]:
<xarray.DataArray (band: 3, y: 180, x: 360)> Size: 194kB
[194400 values with dtype=uint8]
Coordinates:
* band (band) int64 24B 1 2 3
* x (x) float64 3kB -179.5 -178.5 -177.5 ... 177.5 178.5 179.5
* y (y) float64 1kB 89.5 88.5 87.5 86.5 ... -86.5 -87.5 -88.5 -89.5
spatial_ref int64 8B 0
Attributes:
AREA_OR_POINT: Area
STATISTICS_MAXIMUM: 255
STATISTICS_MEAN: 49.507654320988
STATISTICS_MINIMUM: 10
STATISTICS_STDDEV: 65.299010755209
STATISTICS_VALID_PERCENT: 100
_FillValue: 0
scale_factor: 1.0
add_offset: 0.0
The band
here is [1, 2, 3]
, but tile_map
, band
is set to [0, 1, 2]
:
pygmt/pygmt/datasets/tile_map.py
Line 160 in 41a0fe0
"band": np.array(object=[0, 1, 2], dtype=np.uint8), # Red, Green, Blue |
Are there any specific reasons for the differences or any conventions about band
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, the 123 numbering comes from GDAL/rasterio. E.g. if you do gdalinfo ~/.gmt/server/earth/earth_day/earth_day_01d_p.tif
, you'll get:
Driver: GTiff/GeoTIFF
Files: /home/user/.gmt/server/earth/earth_day/earth_day_01d_p.tif
Size is 360, 180
...
Band 1 Block=360x7 Type=Byte, ColorInterp=Undefined
Min=10.000 Max=255.000
Minimum=10.000, Maximum=255.000, Mean=49.508, StdDev=65.299
NoData Value=0
Metadata:
STATISTICS_MAXIMUM=255
STATISTICS_MEAN=49.507654320988
STATISTICS_MINIMUM=10
STATISTICS_STDDEV=65.299010755209
STATISTICS_VALID_PERCENT=100
Band 2 Block=360x7 Type=Byte, ColorInterp=Undefined
Min=10.000 Max=255.000
Minimum=10.000, Maximum=255.000, Mean=49.911, StdDev=62.129
NoData Value=0
Metadata:
STATISTICS_MAXIMUM=255
STATISTICS_MEAN=49.910941358025
STATISTICS_MINIMUM=10
STATISTICS_STDDEV=62.129078727248
STATISTICS_VALID_PERCENT=100
Band 3 Block=360x7 Type=Byte, ColorInterp=Undefined
Min=10.000 Max=255.000
Minimum=10.000, Maximum=255.000, Mean=65.605, StdDev=44.820
NoData Value=0
Metadata:
STATISTICS_MAXIMUM=255
STATISTICS_MEAN=65.605154320988
STATISTICS_MINIMUM=10
STATISTICS_STDDEV=44.819618148349
STATISTICS_VALID_PERCENT=100
The
band
here is[1, 2, 3]
, buttile_map
,band
is set to[0, 1, 2]
:
Yeah, I simply forgot about GDAL's start from 1 convention when implementing #2125 😅. I'm guessing there's a historical reason for this, and I'm not sure if this is the exact reason why GDAL counts from 1, but optical satellite bands (e.g. from Landsat) are usually numbered Band 1, Band 2, etc, so that could be why? Unsure if we should re-number things, or just leave it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should follow the GDAL convention and let band indexing start from 1.
) | ||
|
||
|
||
@pytest.mark.skipif(not _HAS_RIOXARRAY, reason="rioxarray is not installed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pytest.mark.skipif
marker can be removed after #3128 is implemented.
path = which("@earth_day_01d", download="a") | ||
raster = rioxarray.open_rasterio(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines can be replaced by load_blue_marble
after #2235.
Co-authored-by: Wei Ji <[email protected]>
This PR tries to get rid of temporary files from
pygmt.grdcut
(#2730).The
GMT_GRID
wrapper (PR #2398) can store output 2-D grid in virtual files, but doesn't work for output images. PR #3128 will wrapGMT_IMAGE
to store output images in virtual files, but it's still WIP.This PR makes necessary changes to make
pygmt.grdcut
work for both grids and images. For images, we still use temporary files but it should be refactored again after PR #3128.