-
Notifications
You must be signed in to change notification settings - Fork 220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrap GMT_Put_Strings to pass str columns into GMT C API directly #520
Conversation
Include a test called test_put_strings that doesn't actually work yet.
Should be GMT_IS_OUTPUT instead of GMT_OUTPUT. Also update some docstrings that were missed in the #210 refactor PR.
pygmt/tests/test_clib_put.py
Outdated
dataset = lib.create_data( | ||
family="GMT_IS_DATASET|GMT_VIA_VECTOR", | ||
geometry="GMT_IS_POINT", | ||
mode="GMT_CONTAINER_ONLY", | ||
dim=[1, 5, 1, 0], # columns, rows, layers, dtype | ||
) | ||
s = np.array(["a", "b", "c", "d", "e"], dtype=np.str) | ||
lib.put_strings(dataset, family="GMT_IS_VECTOR", strings=s) | ||
# Turns out wesn doesn't matter for Datasets | ||
wesn = [0] * 6 | ||
# Save the data to a file to see if it's being accessed correctly | ||
with GMTTempFile() as tmp_file: | ||
lib.write_data( | ||
"GMT_IS_VECTOR", | ||
"GMT_IS_POINT", | ||
"GMT_WRITE_SET", | ||
wesn, | ||
tmp_file.name, | ||
dataset, | ||
) | ||
# Load the data and check that it's correct | ||
news = tmp_file.loadtxt(unpack=True, dtype=np.str) | ||
npt.assert_allclose(news, s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can attach the strings to a vector or a matrix. X can be a GMT_VECTOR or GMT_MATRIX, which is created by the
create_data
function. Thus, we have to pass family (either GMT_IS_VECTOR or GMT_IS_MATRIX
) to let GMT know the type of X.
Right, so we can't pass in just strings then, but need to tie it to an existing Vector or Matrix? I've tried that (haven't committed it to Github) but am now getting segfaults, which seems promising in a way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dataset is the Vector or Matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, what we want to do is to tie the strings to the dataset (the dataset is either a vector or a matrix).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading this C code (https://github.com/GenericMappingTools/gmt/blob/master/src/testapi_vector_strings.c) may be useful for you. The C code puts x to the col 0, y to col 1, angle to col 2, and then hook in the strings to the vector V.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is helpful, I was reading the GMT.jl one at https://github.com/GenericMappingTools/GMT.jl/blob/master/src/pstext.jl and got really lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I think I got it! I was counting the strings
array as a column using dim=[4, 5, 1, 0], # columns, rows, layers, dtype
, but the string
column should not be counted? I.e. an array with x, y, angle, strings
would just use dim=3
instead of dim=4
. Hold on, I'll make a commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyGMT passes vectors, matrix or grids to GMT. As I understand it, GMT.jl is using a different way passing data to GMT.
GMT.jl wraps the GMT data types like GMT_DATASET, GMT_GRID, GMT_PALETTE (https://docs.generic-mapping-tools.org/latest/api.html#gmt-resources). So what GMT.jl does is attaching vectors to the GMT_DATASET in Julia, and they pass the GMT_DATASET directly to GMT. This is why we find so many GMT API bugs but everything works for GMT.jl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, maybe not. Using dim=2
runs without a segfault in 42af00e, but the 'abcdef ' text
is not output to the textfile, only the x
and y
array data is written. Maybe we do need to use Probably not, the C code you linked uses NCOLS=3 for dim=3
.x, y, angle, strings
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm not sure if it did work, or if it's just that GMT_Put_Strings
doesn't write the strings to the tmp_file. I'll need to either wrap GMT_Get_Strings
or refactor text
to use put_strings
to be sure. Anyways, dinner first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weiji14 Here is a minimal example for experimenting with passing strings array to C. I just post it here for your reference.
The C code (test.c):
#include <stdio.h>
int myfunc (char **array)
{
fprintf(stderr, "%s %s %s\n", array[0], array[1], array[2]);
return 0;
}
To compile the C code into a shared library, run:
gcc -fpic -c test.c
gcc -shared -o test.so test.o
The following Python code can correctly pass strings array to the myfunc function in test.so library:
import ctypes
lib = ctypes.CDLL("test.so")
print(lib)
string = ["ABC", "DEF", "GHI"]
lib.myfunc.restype = ctypes.c_int
arr = (ctypes.c_char_p * len(string))()
arr[:] = [s.encode() for s in string]
lib.myfunc(arr)
The Python trick comes from https://stackoverflow.com/questions/3494598/. It works but I don't understand the details, and don't know if it's the simplest way.
Modified virtualfile_from_vectors to use put_strings on last column instead of put_vectors if it has a string data type. In theory this should work for `text`, but in reality, a segmentation fault happens.
pygmt/clib/session.py
Outdated
c_put_strings = self.get_libgmt_func( | ||
"GMT_Put_Strings", | ||
argtypes=[ctp.c_void_p, ctp.c_uint, ctp.c_void_p, ctp.c_void_p], | ||
restype=ctp.c_int, | ||
) | ||
|
||
strings_pointer = strings.ctypes.data_as(ctp.c_char_p) | ||
status = c_put_strings( | ||
self.session_pointer, self[family], dataset, strings_pointer | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following Python code can correctly pass strings array to the myfunc function in test.so library:
# ... arr = (ctypes.c_char_p * len(string))() arr[:] = [s.encode() for s in string] lib.myfunc(arr)The Python trick comes from https://stackoverflow.com/questions/3494598/. It works but I don't understand the details, and don't know if it's the simplest way.
Yes I tried this following your datetime example, but I think (?) it's functionally equivalent to strings.ctypes.data_as(ctp.c_char_p)
? Either way it runs, but the string data doesn't seem to get written to the virtualfile so the test fails. I've also tried strings.ctypes.data_as(ctp.POINTER(ctp.c_char_p))
but it doesn't seem to make a difference. 😕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strings.ctypes.data_as(ctp.c_char_p)
may not work.
What I did is adding a print statement inside the GMT_Put_Strings function, printing the string arrays passed to it. Using strings.ctypes.data_as(ctp.c_char_p)
crashes immediately for me, but using my code can print the strings correctly, but then crash 🤦♂️.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using:
c_put_strings = self.get_libgmt_func(
"GMT_Put_Strings",
argtypes=[
ctp.c_void_p,
ctp.c_uint,
ctp.c_void_p,
ctp.POINTER(ctp.c_char_p),
],
restype=ctp.c_int,
)
strings_pointer = (ctp.c_char_p * len(strings))()
strings_pointer[:] = np.char.encode(strings)
Does it crash for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it still crashes, but I can see that the strings are correctly passed to the GMT_Put_Strings function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about strings.ctypes.data_as(ctp.POINTER(ctp.c_char_p))
? Are the strings correctly passed in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I was just taking a look at it (thanks for tracking down the bug by the way!). Sounds like we'll need to pin to GMT > 6.1.1 for the next release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I'm not 100% sure GenericMappingTools/gmt#3718 is related to the issue here. I just had the feeling that the string array may be freed by Python when GMT tries to read it.
Hopefully, GMT 6.1.1 can fix issue and #515.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, GenericMappingTools/gmt#3718 is already merged into GMT master, and will be backported to 6.1 branch soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! I'll test it out when I get the time (need to prepare some stuff for an online conference next week). Still need to wait on the grid problem at #515 but that's a separate issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just triggered the GMT master branch testing. Ideally, we should only see one failure from test_put_strings
.
gmt select is meant to take an option for action though. It was never mean to just pass things through. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but let's first wait for the feedbacks from GenericMappingTools/gmt#3907.
Description of proposed changes
Used to insert 1D numpy arrays of string type from PyGMT directly into GMT via the C API. Saves us having to write to intermediate temporary CSV files using
pandas
. Needed for #483 and potentially #516.I'll definitely need help with this one, this is really stretching my knowledge about C programming (read: no idea if what I'm doing makes sense).
See also:
Fixes #
Reminders
make format
andmake check
to make sure the code follows the style guide.doc/api/index.rst
.