Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap GMT's standard data type GMT_DATASET for table inputs #2729

Merged
merged 124 commits into from
Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
124 commits
Select commit Hold shift + click to select a range
713ad0a
Add data types GMT_DATASEGMENT, GMT_DATATABLE and GMT_DATASET
seisman Mar 15, 2023
9c580f9
Updates
seisman Mar 15, 2023
3ce3341
Fix formatting
seisman Mar 16, 2023
b655f88
Finally, a working version
seisman Oct 9, 2023
221efec
Fix the code structure
seisman Oct 9, 2023
6f4a651
Simplify the two options for grd2xyz
seisman Oct 9, 2023
9a1dc0c
fix
seisman Oct 9, 2023
1843b35
Add docstrings to pygmt/datatypes.py
seisman Oct 9, 2023
bd52024
Improve the docstrings
seisman Oct 9, 2023
19eb3aa
Get rid of temporary files from grdtrack
seisman Oct 9, 2023
8546048
Revert "Get rid of temporary files from grdtrack"
seisman Oct 10, 2023
76944a8
pygmt.grdtrack: Support consistent table-like outputs
seisman Oct 9, 2023
bac13bd
Update to virtualfile_to_data which can also be used for grids
seisman Oct 10, 2023
d59092b
Merge branch 'datatypes/gmtdataset' into tempfile/grdtrack
seisman Oct 10, 2023
ba6d94e
Add read_virtualfile_to_data to simplify the logic
seisman Oct 10, 2023
9cd1502
Merge branch 'datatypes/gmtdataset' into tempfile/grdtrack
seisman Oct 10, 2023
589a9dd
Fix a typo
seisman Oct 10, 2023
ff7459c
Merge branch 'main' into datatypes/gmtdataset
seisman Oct 10, 2023
7b485b0
Merge branch 'datatypes/gmtdataset' into tempfile/grdtrack
seisman Oct 10, 2023
47a79e1
Merge branch 'main' into datatypes/gmtdataset
seisman Oct 11, 2023
a06a5b4
Merge read_virtualfile and read_virtualfile_to_data into a single fun…
seisman Oct 11, 2023
7e59970
Merge branch 'datatypes/gmtdataset' into tempfile/grdtrack
seisman Oct 11, 2023
f5a9d44
Refactor the virtualfile_to_data function to support writing to a rea…
seisman Oct 11, 2023
276b6f7
Merge branch 'datatypes/gmtdataset' into tempfile/grdtrack
seisman Oct 11, 2023
5470495
Simplify the codes following the gmtdataset changes
seisman Oct 11, 2023
555bfe3
Move gmtdataset_to_vectors as a method of the GMT_DATASET class
seisman Oct 11, 2023
40510f5
Merge branch 'datatypes/gmtdataset' into tempfile/grdtrack
seisman Oct 11, 2023
6d26c10
Update after gmtdataset changes
seisman Oct 11, 2023
2205cbb
Add Pythonic objects
seisman Oct 11, 2023
4e67ebf
Add more notes about GMT.jl implementation
seisman Oct 11, 2023
4707b87
Refactor the codes using nested classes
seisman Oct 12, 2023
013f4bb
Add more examples
seisman Oct 12, 2023
19fd164
Deal with text column
seisman Oct 12, 2023
0772711
Merge branch 'main' into datatypes/gmtdataset
seisman Oct 13, 2023
0b8733d
Merge branch 'main' into datatypes/gmtdataset
seisman Oct 25, 2023
87f4cf0
Merge branch 'tempfile/grdtrack' into datatypes/gmtdataset
seisman Oct 25, 2023
3cff015
Fix linting issues
seisman Oct 25, 2023
9843eca
Standarize virtual file names
seisman Oct 25, 2023
9af418a
Improve the GMT_DATASET doctest
seisman Oct 25, 2023
7160a4f
Fix a typo
seisman Oct 25, 2023
6e07b95
Improve the doctest for GMT_DATASET.to_vectors()
seisman Oct 25, 2023
7da3654
Add more comments to the GMT_DATASET.to_vectors function
seisman Oct 25, 2023
afecfc0
Remove the GMT_DATASET.to_vectors_v2 method
seisman Oct 25, 2023
89efc18
Remove the GMT_DATASET.to_pydata method to focus on the GMT dataset s…
seisman Oct 25, 2023
c7a0982
Fix linting issues
seisman Oct 25, 2023
3c46aca
Remove two blank lines
seisman Oct 25, 2023
78c3959
Let pandas deal with the conversion to numpy
seisman Oct 25, 2023
10d0c1e
Merge branch 'main' into datatypes/gmtdataset
seisman Oct 26, 2023
c86208c
Disable some pylint warnings
seisman Oct 26, 2023
caa9d10
Let pandas deal with 1-D arrays directly
seisman Oct 26, 2023
d37efcf
Fix linting issues
seisman Oct 27, 2023
5c3f1b1
Add a return_table function
seisman Oct 27, 2023
cd11f98
Add the validate_output_type function to check the output_type parameter
seisman Oct 27, 2023
4a42c4d
Use pd.DataFrame.from_dict to construct the DataFrame object
seisman Oct 28, 2023
de31384
Refactor grdvolume.py
seisman Oct 28, 2023
333c5eb
Refactor select.py
seisman Oct 28, 2023
4e14d30
Merge branch 'validators/output_type' into datatypes/gmtdataset
seisman Oct 28, 2023
061355f
Use validate_output_type
seisman Oct 28, 2023
f36448f
Refactor blockm*
seisman Oct 28, 2023
1ca2b76
Refactor filter1d
seisman Oct 28, 2023
2760a21
Fix a bug in filter1d test
seisman Oct 28, 2023
e99cd1e
Refactor project.py
seisman Oct 28, 2023
4ff0a15
Refactor the table part of grdhisteq
seisman Oct 28, 2023
1bd46ad
Refactor the table part of triangulate
seisman Oct 28, 2023
48f94cf
Fix a bug in grdhisteq
seisman Oct 29, 2023
49625df
Fix grdhisteq
seisman Oct 29, 2023
a3c37fa
Merge branch 'main' into datatypes/gmtdataset
seisman Oct 30, 2023
473dc7b
Fix merge errors
seisman Oct 30, 2023
df48ada
grdtrack: Use validate_output_table_type
seisman Oct 30, 2023
fd95fa1
Merge branch 'main' into datatypes/gmtdataset
seisman Oct 31, 2023
232973a
Merge branch 'main' into datatypes/gmtdataset
seisman Nov 1, 2023
3b5e4f8
Formatting
seisman Nov 1, 2023
0f1deec
Merge branch 'main' into datatypes/gmtdataset
seisman Nov 8, 2023
c1a95b4
Merge branch 'main' into datatypes/gmtdataset
seisman Nov 10, 2023
9f701b7
Merge branch 'main' into datatypes/gmtdataset
seisman Nov 13, 2023
3d5147d
Consistently use column_names
seisman Nov 13, 2023
b9454ce
Always convert text data to string dtype
seisman Nov 14, 2023
8742467
Change the to_vectors method to to_dataframe which returns a pd.DataF…
seisman Nov 14, 2023
2a2b607
Merge branch 'main' into datatypes/gmtdataset
seisman Nov 16, 2023
2ca768f
Merge branch 'main' into datatypes/gmtdataset
seisman Nov 18, 2023
b7e0823
Requires pandas>=1.2.0
seisman Nov 18, 2023
e9de4bb
Fix formatting
seisman Nov 18, 2023
b09b13a
Merge branch 'main' into datatypes/gmtdataset
seisman Nov 21, 2023
2cba0c7
Remove pylint directives
seisman Nov 21, 2023
99fc619
Merge branch 'main' into datatypes/gmtdataset
seisman Nov 26, 2023
2baa7af
Merge branch 'main' into datatypes/gmtdataset
seisman Dec 4, 2023
1754e9a
Minor fix
seisman Dec 4, 2023
af96106
Merge branch 'main' into datatypes/gmtdataset
seisman Dec 19, 2023
d7d0164
Merge branch 'main' into datatypes/gmtdataset
seisman Dec 25, 2023
ccb9f47
Merge branch 'main' into datatypes/gmtdataset
seisman Jan 1, 2024
96643ad
Merge branch 'main' into datatypes/gmtdataset
seisman Jan 2, 2024
6ddab7f
Merge branch 'main' into datatypes/gmtdataset
seisman Jan 7, 2024
d70a7c4
Rewrap and add type hints
seisman Jan 7, 2024
a73927e
Temporarily enable benchmarks
seisman Jan 7, 2024
cad0bbe
Merge branch 'main' into datatypes/gmtdataset
seisman Feb 19, 2024
79c499d
Move dataset definition into pygmt/datatypes/dataset.py
seisman Feb 19, 2024
c4d47db
Remove unused imports
seisman Feb 19, 2024
1fdb9f9
Merge branch 'main' into datatypes/gmtdataset
seisman Feb 20, 2024
76a09f0
Fix open_virtual_file to open_virtualfile
seisman Feb 20, 2024
9a035ef
Improve docstrings
seisman Feb 20, 2024
828c9c1
Add doctests for virtualfile_to_data
seisman Feb 20, 2024
4a5eea3
isort
seisman Feb 20, 2024
a878635
clib.Session: Add the virtualfile_to_data method for creating virtual…
seisman Feb 20, 2024
c72701e
Improve docstrings
seisman Feb 20, 2024
761aff4
Improve the return_table function
seisman Feb 20, 2024
279eeb4
column_names default to None
seisman Feb 21, 2024
e933497
Update select
seisman Feb 21, 2024
d1f3150
Update blockm
seisman Feb 21, 2024
77739f8
Update grdtrack
seisman Feb 21, 2024
4815c59
Merge branch 'virtualfile_to_data' into datatypes/gmtdataset
seisman Feb 21, 2024
2b1d565
Remove the old codes
seisman Feb 21, 2024
9ddca59
Merge branch 'main' into datatypes/gmtdataset
seisman Feb 28, 2024
a21b2df
Revert non-related changes and focus on the _GMT_DATASET class
seisman Feb 28, 2024
180f3ec
Minor fixes and improvements
seisman Feb 28, 2024
84b3bf4
Merge branch 'main' into datatypes/gmtdataset
seisman Feb 29, 2024
6e71924
Merge branch 'main' into datatypes/gmtdataset
seisman Mar 1, 2024
08be754
Merge branch 'main' into datatypes/gmtdataset
seisman Mar 2, 2024
cc5c4ec
Move the object->str conversion of text column to the to_dataframe me…
seisman Mar 2, 2024
f654c93
Merge branch 'main' into datatypes/gmtdataset
seisman Mar 5, 2024
e5698a5
Merge branch 'main' into datatypes/gmtdataset
seisman Mar 6, 2024
661af0b
Apply suggestions from code review
seisman Mar 6, 2024
394a054
Update pygmt/datatypes/dataset.py
seisman Mar 6, 2024
63ed85f
Merge branch 'main' into datatypes/gmtdataset
seisman Mar 7, 2024
17d2b4c
Remove a blank line
seisman Mar 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions pygmt/clib/session.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
vectors_to_arrays,
)
from pygmt.clib.loading import load_libgmt
from pygmt.datatypes import GMT_DATASET
from pygmt.exceptions import (
GMTCLibError,
GMTCLibNoSessionError,
Expand Down Expand Up @@ -1680,3 +1681,70 @@ def extract_region(self):
if status != 0:
raise GMTCLibError("Failed to extract region from current figure.")
return wesn

def read_virtualfile(self, vfname):
"""
Read data from a virtual file.

Parameters
----------
vfname : str
Name of the virtual file to read.

Returns
-------
Pointer to the data, which can be casted into GMT data types.
"""
c_read_virtualfile = self.get_libgmt_func(
"GMT_Read_VirtualFile",
argtypes=[
ctp.c_void_p,
ctp.c_char_p,
],
restype=ctp.c_void_p,
)
return c_read_virtualfile(self.session_pointer, vfname.encode())

@contextmanager
def virtualfile_to_gmtdataset(self):
"""
Create a virtual file for writing a GMT_DATASET object.

Yields
------
vfile : str
Name of the virtual file.
"""
family = "GMT_IS_DATASET"
geometry = "GMT_IS_PLP"
with self.open_virtual_file(family, geometry, "GMT_OUT", None) as vfile:
yield vfile

def gmtdataset_to_vectors(self, vfile):
"""
Read GMT_DATASET object from a virtual file and convert to vectors.

Parameters
----------
vfile : str
Name of the virtual file.

Returns
-------
vectors : list of 1-D arrays
List of vectors containing the data from the GMT_DATASET object.
"""
# Read the virtual file and cast it to a pointer to a GMT_DATASET
ds = ctp.cast(self.read_virtualfile(vfile), ctp.POINTER(GMT_DATASET)).contents

# Loop over the tables, segments, and columns to get the data as vectors
vectors = []
for itbl in range(ds.n_tables):
dtbl = ds.table[itbl].contents
for iseg in range(dtbl.n_segments):
dseg = dtbl.segment[iseg].contents
for icol in range(dseg.n_columns):
vectors.append(
np.ctypeslib.as_array(dseg.data[icol], shape=(dseg.n_rows,))
)
return vectors
73 changes: 73 additions & 0 deletions pygmt/datatypes.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
"""
GMT data types for ctypes.

See the GMT source code gmt_resources.h for the original C struct definitions.
"""
import ctypes as ctp


class GMT_DATASEGMENT(ctp.Structure):
"""
For holding segment lines in memory.
"""

_fields_ = [
("n_rows", ctp.c_uint64), # Number of points in this segment
("n_columns", ctp.c_uint64), # Number of fields in each record (>= 2)
("min", ctp.POINTER(ctp.c_double)), # Minimum coordinate for each column
("max", ctp.POINTER(ctp.c_double)), # Maximum coordinate for each column
# Data x, y, and possibly other columns
("data", ctp.POINTER(ctp.POINTER(ctp.c_double))),
("label", ctp.c_char_p), # Label string (if applicable)
("header", ctp.c_char_p), # Segment header (if applicable)
("text", ctp.POINTER(ctp.c_char_p)), # text beyond the data
("hidden", ctp.c_void_p), # Book-keeping variables "hidden" from the API
]


class GMT_DATATABLE(ctp.Structure):
"""
To hold an array of line segment structures and header information in one
container.
"""

_fields_ = [
("n_headers", ctp.c_uint), # Number of file header records (0 if no header)
("n_columns", ctp.c_uint64), # Number of columns (fields) in each record
("n_segments", ctp.c_uint64), # Number of segments in the array
("n_records", ctp.c_uint64), # Total number of data records across all segments
("min", ctp.POINTER(ctp.c_double)), # Minimum coordinate for each column
("max", ctp.POINTER(ctp.c_double)), # Maximum coordinate for each column
# Array with all file header records, if any
("header", ctp.POINTER(ctp.c_char_p)),
# Pointer to array of segments
("segment", ctp.POINTER(ctp.POINTER(GMT_DATASEGMENT))),
("hidden", ctp.c_void_p), # Book-keeping variables "hidden" from the API
]


class GMT_DATASET(ctp.Structure):
"""
Single container for an array of GMT tables (files).
"""

_fields_ = [
("n_tables", ctp.c_uint64), # The total number of tables (files) contained
("n_columns", ctp.c_uint64), # The number of data columns
("n_segments", ctp.c_uint64), # The total number of segments across all tables
# The total number of data records across all tables
("n_records", ctp.c_uint64),
("min", ctp.POINTER(ctp.c_double)), # Minimum coordinate for each column
("max", ctp.POINTER(ctp.c_double)), # Maximum coordinate for each column
# Pointer to array of tables
("table", ctp.POINTER(ctp.POINTER(GMT_DATATABLE))),
# The datatype (numerical, text, or mixed) of this dataset
("type", ctp.c_int32),
("geometry", ctp.c_int32), # The geometry of this dataset
# To store a referencing system string in PROJ.4 format
("ProjRefPROJ4", ctp.c_char_p),
# To store a referencing system string in WKT format
("ProjRefWKT", ctp.c_char_p),
("ProjRefEPSG", ctp.c_int), # To store a referencing system EPSG code
("hidden", ctp.c_void_p), # Book-keeping variables "hidden" from the API
]
61 changes: 34 additions & 27 deletions pygmt/src/grd2xyz.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,12 @@
"""
import warnings

import numpy as np
import pandas as pd
import xarray as xr
from pygmt.clib import Session
from pygmt.exceptions import GMTInvalidInput
from pygmt.helpers import (
GMTTempFile,
build_arg_string,
fmt_docstring,
kwargs_to_strings,
use_alias,
)
from pygmt.helpers import build_arg_string, fmt_docstring, kwargs_to_strings, use_alias

__doctest_skip__ = ["grd2xyz"]

Expand Down Expand Up @@ -172,25 +167,37 @@
# Reverse the dims because it is rows, columns ordered.
dataframe_header = [grid.dims[1], grid.dims[0], grid.name]

with GMTTempFile() as tmpfile:
with Session() as lib:
file_context = lib.virtualfile_from_data(check_kind="raster", data=grid)
with file_context as infile:
if outfile is None:
outfile = tmpfile.name
lib.call_module(
module="grd2xyz",
args=build_arg_string(kwargs, infile=infile, outfile=outfile),
)

# Read temporary csv output to a pandas table
if outfile == tmpfile.name: # if user did not set outfile, return pd.DataFrame
result = pd.read_csv(
tmpfile.name, sep="\t", names=dataframe_header, comment=">"
with Session() as lib:
with lib.virtualfile_from_data(
check_kind="raster", data=grid
) as invfile, lib.virtualfile_to_gmtdataset() as outvfile:
seisman marked this conversation as resolved.
Show resolved Hide resolved
# Option 1
lib.call_module(
module="grd2xyz",
args=build_arg_string(kwargs, infile=invfile, outfile=outvfile),
)

if output_type == "file":
lib.call_module("write", f"{outvfile} {outfile} -Td")
return None
vectors = lib.gmtdataset_to_vectors(outvfile)
if output_type == "numpy":
return np.array(vectors).T
return pd.DataFrame(data=np.array(vectors).T, columns=dataframe_header)

"""

Check warning on line 188 in pygmt/src/grd2xyz.py

View check run for this annotation

Codecov / codecov/patch

pygmt/src/grd2xyz.py#L188

Added line #L188 was not covered by tests
# Option 2
if output_type == "file":
outvfile = outfile
lib.call_module(
module="grd2xyz",
args=build_arg_string(kwargs, infile=invfile, outfile=outvfile),
)
elif outfile != tmpfile.name: # return None if outfile set, output in outfile
result = None

if output_type == "numpy":
result = result.to_numpy()
return result
if output_type == "file":
return None
vectors = lib.gmtdataset_to_vectors(outvfile)
if output_type == "numpy":
return np.array(vectors).T
return pd.DataFrame(data=np.array(vectors).T, columns=dataframe_header)
"""
62 changes: 30 additions & 32 deletions pygmt/src/grdtrack.py
Original file line number Diff line number Diff line change
@@ -1,16 +1,11 @@
"""
grdtrack - Sample grids at specified (x,y) locations.
"""
import numpy as np
import pandas as pd
from pygmt.clib import Session
from pygmt.exceptions import GMTInvalidInput
from pygmt.helpers import (
GMTTempFile,
build_arg_string,
fmt_docstring,
kwargs_to_strings,
use_alias,
)
from pygmt.helpers import build_arg_string, fmt_docstring, kwargs_to_strings, use_alias

__doctest_skip__ = ["grdtrack"]

Expand Down Expand Up @@ -43,7 +38,9 @@
w="wrap",
)
@kwargs_to_strings(R="sequence", S="sequence", i="sequence_comma", o="sequence_comma")
def grdtrack(grid, points=None, newcolname=None, outfile=None, **kwargs):
def grdtrack(
grid, points=None, newcolname=None, output_type="pandas", outfile=None, **kwargs
):
r"""
Sample grids at specified (x,y) locations.

Expand Down Expand Up @@ -292,29 +289,30 @@
if hasattr(points, "columns") and newcolname is None:
raise GMTInvalidInput("Please pass in a str to 'newcolname'")

with GMTTempFile(suffix=".csv") as tmpfile:
with Session() as lib:
with lib.virtualfile_from_data(
check_kind="raster", data=grid
) as grdfile, lib.virtualfile_from_data(
check_kind="vector", data=points, required_data=False
) as csvfile:
kwargs["G"] = grdfile
if outfile is None: # Output to tmpfile if outfile is not set
outfile = tmpfile.name
lib.call_module(
module="grdtrack",
args=build_arg_string(kwargs, infile=csvfile, outfile=outfile),
)
with Session() as lib:
with lib.virtualfile_from_data(
check_kind="raster", data=grid
) as grdfile, lib.virtualfile_from_data(
check_kind="vector", data=points, required_data=False
) as csvfile, lib.virtualfile_to_gmtdataset() as outvfile:
kwargs["G"] = grdfile
lib.call_module(
module="grdtrack",
args=build_arg_string(kwargs, infile=csvfile, outfile=outvfile),
)
if outfile is not None:
# if output_type == "file":
seisman marked this conversation as resolved.
Show resolved Hide resolved
lib.call_module("write", f"{outvfile} {outfile} -Td")
return None

vectors = lib.gmtdataset_to_vectors(outvfile)

if output_type == "numpy":
return np.array(vectors).T

Check warning on line 311 in pygmt/src/grdtrack.py

View check run for this annotation

Codecov / codecov/patch

pygmt/src/grdtrack.py#L311

Added line #L311 was not covered by tests

# Read temporary csv output to a pandas table
if outfile == tmpfile.name: # if user did not set outfile, return pd.DataFrame
try:
column_names = points.columns.to_list() + [newcolname]
result = pd.read_csv(tmpfile.name, sep="\t", names=column_names)
except AttributeError: # 'str' object has no attribute 'columns'
result = pd.read_csv(tmpfile.name, sep="\t", header=None, comment=">")
elif outfile != tmpfile.name: # return None if outfile set, output in outfile
result = None
if isinstance(points, pd.DataFrame):
column_names = points.columns.to_list() + [newcolname]
else:
column_names = None

return result
return pd.DataFrame(np.array(vectors).T, columns=column_names)
Loading