Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrap GMT_Put_Strings to pass str columns into GMT C API directly #520

Merged
merged 36 commits into from
Aug 10, 2020

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Jul 13, 2020

Description of proposed changes

Used to insert 1D numpy arrays of string type from PyGMT directly into GMT via the C API. Saves us having to write to intermediate temporary CSV files using pandas. Needed for #483 and potentially #516.

I'll definitely need help with this one, this is really stretching my knowledge about C programming (read: no idea if what I'm doing makes sense).

See also:

Fixes #

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If adding new functionality, add an example to docstrings or tutorials.

Include a test called test_put_strings that doesn't actually work yet.
@weiji14 weiji14 added the feature Brand new feature label Jul 13, 2020
@weiji14 weiji14 added this to the 0.2.x milestone Jul 13, 2020
pygmt/clib/session.py Outdated Show resolved Hide resolved
pygmt/clib/session.py Outdated Show resolved Hide resolved
pygmt/clib/session.py Outdated Show resolved Hide resolved
Should be GMT_IS_OUTPUT instead of GMT_OUTPUT. Also update some docstrings that were missed in the #210 refactor PR.
@vercel vercel bot temporarily deployed to Preview July 14, 2020 03:57 Inactive
Comment on lines 17 to 39
dataset = lib.create_data(
family="GMT_IS_DATASET|GMT_VIA_VECTOR",
geometry="GMT_IS_POINT",
mode="GMT_CONTAINER_ONLY",
dim=[1, 5, 1, 0], # columns, rows, layers, dtype
)
s = np.array(["a", "b", "c", "d", "e"], dtype=np.str)
lib.put_strings(dataset, family="GMT_IS_VECTOR", strings=s)
# Turns out wesn doesn't matter for Datasets
wesn = [0] * 6
# Save the data to a file to see if it's being accessed correctly
with GMTTempFile() as tmp_file:
lib.write_data(
"GMT_IS_VECTOR",
"GMT_IS_POINT",
"GMT_WRITE_SET",
wesn,
tmp_file.name,
dataset,
)
# Load the data and check that it's correct
news = tmp_file.loadtxt(unpack=True, dtype=np.str)
npt.assert_allclose(news, s)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can attach the strings to a vector or a matrix. X can be a GMT_VECTOR or GMT_MATRIX, which is created by the create_data function. Thus, we have to pass family (either GMT_IS_VECTOR or GMT_IS_MATRIX) to let GMT know the type of X.

Right, so we can't pass in just strings then, but need to tie it to an existing Vector or Matrix? I've tried that (haven't committed it to Github) but am now getting segfaults, which seems promising in a way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dataset is the Vector or Matrix.

Copy link
Member

@seisman seisman Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, what we want to do is to tie the strings to the dataset (the dataset is either a vector or a matrix).

Copy link
Member

@seisman seisman Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this C code (https://github.com/GenericMappingTools/gmt/blob/master/src/testapi_vector_strings.c) may be useful for you. The C code puts x to the col 0, y to col 1, angle to col 2, and then hook in the strings to the vector V.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is helpful, I was reading the GMT.jl one at https://github.com/GenericMappingTools/GMT.jl/blob/master/src/pstext.jl and got really lost.

Copy link
Member Author

@weiji14 weiji14 Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I think I got it! I was counting the strings array as a column using dim=[4, 5, 1, 0], # columns, rows, layers, dtype, but the string column should not be counted? I.e. an array with x, y, angle, strings would just use dim=3 instead of dim=4. Hold on, I'll make a commit.

Copy link
Member

@seisman seisman Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyGMT passes vectors, matrix or grids to GMT. As I understand it, GMT.jl is using a different way passing data to GMT.

GMT.jl wraps the GMT data types like GMT_DATASET, GMT_GRID, GMT_PALETTE (https://docs.generic-mapping-tools.org/latest/api.html#gmt-resources). So what GMT.jl does is attaching vectors to the GMT_DATASET in Julia, and they pass the GMT_DATASET directly to GMT. This is why we find so many GMT API bugs but everything works for GMT.jl.

Copy link
Member Author

@weiji14 weiji14 Jul 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, maybe not. Using dim=2 runs without a segfault in 42af00e, but the 'abcdef ' text is not output to the textfile, only the x and y array data is written. Maybe we do need to use dim=3. Probably not, the C code you linked uses NCOLS=3 for x, y, angle, strings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not sure if it did work, or if it's just that GMT_Put_Strings doesn't write the strings to the tmp_file. I'll need to either wrap GMT_Get_Strings or refactor text to use put_strings to be sure. Anyways, dinner first.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@weiji14 Here is a minimal example for experimenting with passing strings array to C. I just post it here for your reference.

The C code (test.c):

#include <stdio.h>

int myfunc (char **array)
{
	fprintf(stderr, "%s %s %s\n", array[0], array[1], array[2]);
	return 0;
}

To compile the C code into a shared library, run:

gcc -fpic -c test.c
gcc -shared -o test.so test.o

The following Python code can correctly pass strings array to the myfunc function in test.so library:

import ctypes

lib = ctypes.CDLL("test.so")
print(lib)
string = ["ABC", "DEF", "GHI"]
lib.myfunc.restype = ctypes.c_int

arr = (ctypes.c_char_p * len(string))()
arr[:] = [s.encode() for s in string]
lib.myfunc(arr)

The Python trick comes from https://stackoverflow.com/questions/3494598/. It works but I don't understand the details, and don't know if it's the simplest way.

Modified virtualfile_from_vectors to use put_strings on last column instead of put_vectors if it has a string data type. In theory this should work for `text`, but in reality, a segmentation fault happens.
Comment on lines 829 to 838
c_put_strings = self.get_libgmt_func(
"GMT_Put_Strings",
argtypes=[ctp.c_void_p, ctp.c_uint, ctp.c_void_p, ctp.c_void_p],
restype=ctp.c_int,
)

strings_pointer = strings.ctypes.data_as(ctp.c_char_p)
status = c_put_strings(
self.session_pointer, self[family], dataset, strings_pointer
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following Python code can correctly pass strings array to the myfunc function in test.so library:

# ...
arr = (ctypes.c_char_p * len(string))()
arr[:] = [s.encode() for s in string]
lib.myfunc(arr)

The Python trick comes from https://stackoverflow.com/questions/3494598/. It works but I don't understand the details, and don't know if it's the simplest way.

Yes I tried this following your datetime example, but I think (?) it's functionally equivalent to strings.ctypes.data_as(ctp.c_char_p)? Either way it runs, but the string data doesn't seem to get written to the virtualfile so the test fails. I've also tried strings.ctypes.data_as(ctp.POINTER(ctp.c_char_p)) but it doesn't seem to make a difference. 😕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strings.ctypes.data_as(ctp.c_char_p) may not work.

What I did is adding a print statement inside the GMT_Put_Strings function, printing the string arrays passed to it. Using strings.ctypes.data_as(ctp.c_char_p) crashes immediately for me, but using my code can print the strings correctly, but then crash 🤦‍♂️.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using:

c_put_strings = self.get_libgmt_func(
    "GMT_Put_Strings",
    argtypes=[
        ctp.c_void_p,
        ctp.c_uint,
        ctp.c_void_p,
        ctp.POINTER(ctp.c_char_p),
    ],
    restype=ctp.c_int,
)
strings_pointer = (ctp.c_char_p * len(strings))()
strings_pointer[:] = np.char.encode(strings)

Does it crash for you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it still crashes, but I can see that the strings are correctly passed to the GMT_Put_Strings function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about strings.ctypes.data_as(ctp.POINTER(ctp.c_char_p))? Are the strings correctly passed in?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I was just taking a look at it (thanks for tracking down the bug by the way!). Sounds like we'll need to pin to GMT > 6.1.1 for the next release?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm not 100% sure GenericMappingTools/gmt#3718 is related to the issue here. I just had the feeling that the string array may be freed by Python when GMT tries to read it.

Hopefully, GMT 6.1.1 can fix issue and #515.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, GenericMappingTools/gmt#3718 is already merged into GMT master, and will be backported to 6.1 branch soon.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! I'll test it out when I get the time (need to prepare some stuff for an online conference next week). Still need to wait on the grid problem at #515 but that's a separate issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just triggered the GMT master branch testing. Ideally, we should only see one failure from test_put_strings.

@PaulWessel
Copy link
Member

gmt select is meant to take an option for action though. It was never mean to just pass things through.

@weiji14 weiji14 marked this pull request as draft August 10, 2020 10:10
@weiji14 weiji14 marked this pull request as ready for review August 10, 2020 10:11
@weiji14 weiji14 marked this pull request as draft August 10, 2020 10:53
@weiji14 weiji14 marked this pull request as ready for review August 10, 2020 10:54
Copy link
Member

@seisman seisman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but let's first wait for the feedbacks from GenericMappingTools/gmt#3907.

@weiji14 weiji14 marked this pull request as draft August 10, 2020 20:25
@weiji14 weiji14 marked this pull request as ready for review August 10, 2020 20:26
@weiji14 weiji14 changed the title Wrap GMT_Put_Strings Wrap GMT_Put_Strings for passing string columns into GMT C API directly Aug 10, 2020
@weiji14 weiji14 changed the title Wrap GMT_Put_Strings for passing string columns into GMT C API directly Wrap GMT_Put_Strings to pass str columns into GMT C API directly Aug 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Brand new feature upstream Bug or missing feature of upstream core GMT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants