Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the Session.virtualfile_from_stringio method to allow StringIO input for certain functions/methods #3326

Merged
merged 38 commits into from
Sep 18, 2024

Conversation

seisman
Copy link
Member

@seisman seisman commented Jul 11, 2024

Description of proposed changes

This PR adds support for StringIO inputs, which was initially requested in #571 (4 yeas ago!).

GMT doesn't natively support reading StringIO objects. To pass a StringIO object into GMT, we need to create a GMT_DATASET container and store the contents of the StringIO object in the GMT_DATASET container.

Ideally, any PyGMT methods/functions should accept StringIO input, but it's technically difficult to implement. The main reason is that, when creating a GMT_DATASET container, we need to specify the number of tables, segments, columns, and rows, which means we must try very hard to parse the contents of the StringIO object. The number of columns actually is the most difficult to determine. For example, the content below should be 2 numeric columns and one trailing text column, but we may incorrectly parse it to have 0 numeric columns:

30.0W 20.0S TEXT1 TEXT4
35.0W 20.5S TEXT2

and the following content that we want to pass to Figure.text's paragraph mode (#1078), should have zero numeric columns and one trailing text column, but we may incorrectly think it has two numeric columns:

gmt text -R0/3/0/5 -JX3i -h1 -M -N -F+f12,Times-Roman+jLT -pdf figure << EOF
This is an unmarked header record not starting with #
> 0 -0.5 13p 3i j
20240101 010101 Event 1
20240101 010101 Event 2
EOF

Thus, to keep things as simple as possible, I think we should only allow StringIO input for a few modules (e.g., makes much sense for Figure.legend, but makes little sense to Figure.plot). In this case, we just need to store the StringIO contents into the trailing text of the GMT_DATASET container.

This PR implements the virtualfile_from_string method to allow StringIO objects. There are two related PRs:

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
  • If adding new functionality, add an example to docstrings or tutorials.
  • Use underscores (not hyphens) in names of Python files and directories.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash command is:

  • /format: automatically format and lint the code

@weiji14 weiji14 added the feature Brand new feature label Jul 11, 2024
@seisman
Copy link
Member Author

seisman commented Jul 12, 2024

Ideally, any PyGMT methods/functions should accept StringIO input, but it's technically difficult to implement. The main reason is that, when creating a GMT_DATASET container, we need to specify the number of tables, segments, columns, and rows, which means we must try very hard to parse the contents of the StringIO object. The number of columns actually is the most difficult to determine.

Actually, when calling a GMT module, we usually know the expected number of numeric columns (for example, 0 for legend, 2 or more for plot), so we could add an extra parameter n_columns to the virtualfile_from_stringio function after refactoring the virtualfile_in method in #2744. But I feel that passing a StringIO object to modules like plot makes little sense and has very low priority. Then the question is, how can we add StringIO support to specific functions only (e.g., Figure.legend)?

@seisman seisman added the needs review This PR has higher priority and needs review. label Jul 12, 2024
@seisman seisman removed the needs review This PR has higher priority and needs review. label Jul 23, 2024
@seisman seisman changed the base branch from main to refactor/legend September 12, 2024 00:41
Base automatically changed from refactor/legend to main September 12, 2024 16:00
Comment on lines 1660 to 1664
if line.startswith(">"): # Segment header
if header is not None: # Only one segment is allowed now.
raise GMTInvalidInput("Only one segment is allowed.")
header = line
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a test to check that multi-segment inputs fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's likely we have to allow multi-segments in StringIO.

Here is a CLI version for typesetting a paragraph of text. It seems the first line must be a line not starting with # and is ignored. Not sure why it was designed like this. I'll open another POC PR for Figure.text and see how it works.

gmt text -R0/3/0/5 -JX3i -h1 -M -N -F+f12,Times-Roman+jLT -pdf figure << EOF
This is an unmarked header record not starting with #
> 0 -0.5 13p 3i j
@%5%Figure 1.@%% This illustration shows nothing useful, but it still needs
a figure caption. Highlighted in @;255/0/0;red@;; you can see the locations
of cities where it is @\_impossible@\_ to get any good Thai food; these are to be avoided.
EOF

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: I didn't see the -h1 option in the CLI version, which skips the first line. Without -h1, the CLI version still works:

gmt text -R0/3/0/5 -JX3i -M -N -F+f12,Times-Roman+jLT -pdf figure << EOF
> 0 -0.5 13p 3i j
@%5%Figure 1.@%% This illustration shows nothing useful, but it still needs
a figure caption. Highlighted in @;255/0/0;red@;; you can see the locations
of cities where it is @\_impossible@\_ to get any good Thai food; these are to be avoided.
EOF

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a POC example to show how to support paragraph mode via StringIO:

In [1]: import io

In [2]: stringio = io.StringIO(
   ...:     "> 0 -0.5 13p 3i j\n"
   ...:     "@%5%Figure 1.@%% This illustration shows nothing useful, but it still needs\n"
   ...:     "a figure caption. Highlighted in @;255/0/0;red@;; you can see the locations\n"
   ...:     "of cities where it is @_impossible@_ to get any good Thai food; these are to be avoided.\n"
   ...: )

In [3]: import pygmt

In [4]: fig = pygmt.Figure()

In [5]: from pygmt.clib import Session

In [6]: with Session() as lib:
   ...:     with lib.virtualfile_in(data=stringio) as vintbl:
   ...:         lib.call_module(module="text", args=f"{vintbl} -R0/3/0/5 -JX3i -M -N -F+f12,Times-Roman+jLT")
   ...:

In [7]: fig.show()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringIO objects containing multi-segments are supported in the latest version.

finally:
# Must set the text to None to avoid double freeing the memory
seg.text = None

def virtualfile_in( # noqa: PLR0912
self,
check_kind=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the question is, how can we add StringIO support to specific functions only (e.g., Figure.legend)?

Maybe legend should have a special check_kind? E.g. if check_kind("legend"): valid_kinds += ("stringio", ...)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #3438 for the POC PR for Figure.legend. Since we already checked the data kind before entering the session, we can have check_kind=False in Figure.legend.

@seisman seisman changed the title Add the Session.virtualfile_from_stringio function to allow StringIO input Add the Session.virtualfile_from_stringio method to allow StringIO input Sep 13, 2024
@seisman seisman changed the title Add the Session.virtualfile_from_stringio method to allow StringIO input WIP: Add the Session.virtualfile_from_stringio method to allow StringIO input Sep 13, 2024
@seisman seisman marked this pull request as ready for review September 13, 2024 10:52
@seisman seisman changed the title WIP: Add the Session.virtualfile_from_stringio method to allow StringIO input Add the Session.virtualfile_from_stringio method to allow StringIO input Sep 13, 2024
@seisman seisman added the needs review This PR has higher priority and needs review. label Sep 13, 2024
@seisman seisman added this to the 0.14.0 milestone Sep 13, 2024
@seisman seisman changed the title Add the Session.virtualfile_from_stringio method to allow StringIO input Add the Session.virtualfile_from_stringio method to allow StringIO input for certain functions/methods Sep 13, 2024
pygmt/clib/session.py Outdated Show resolved Hide resolved
@seisman seisman requested a review from a team September 18, 2024 02:12
Copy link
Member

@weiji14 weiji14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @seisman for adding multi-segment support! I just have one concern/thought about non-ASCII character support, but that could be handled later (if needed).

@@ -60,6 +61,7 @@
"GMT_IS_PLP", # items could be any one of POINT, LINE, or POLY
"GMT_IS_SURFACE", # items are 2-D grid
"GMT_IS_VOLUME", # items are 3-D grid
"GMT_IS_TEXT", # Text strings which triggers ASCII text reading
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is only ASCII supported for now? What about other encodings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GMT only accepts ASCII text strings. Any non-ASCII characters should be written in the form of octal codes.

@seisman seisman merged commit 890626d into main Sep 18, 2024
20 checks passed
@seisman seisman deleted the api/virtualfile-from-stringio branch September 18, 2024 23:42
@seisman seisman removed the needs review This PR has higher priority and needs review. label Sep 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Brand new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants