Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the data_kind and validate_data_input functions #3335

Merged
merged 11 commits into from
Jul 20, 2024

Conversation

seisman
Copy link
Member

@seisman seisman commented Jul 17, 2024

Description of proposed changes

This PR is a subset of PR #2744. PR #2744 takes more time than I initially expected (Still not satisfied with the new Session.virtualfile_in API function), so better to refactor the data_kind/validate_data_input function in a separate PR (this PR).

As mentioned in PR #2744:

The data_kind function does three things: (1) determines the kind of the input data, and (2) checks if the data/x/y/z combinations are valid; and (3) checks if the matrix-type data has 3 columns.
The data_kind function is called inside Session.virtualfile_in, but sometimes we need to know the input data kind when wrapping GMT modules, for example, in Figure.plot and Figure.plot3d. It means the data_kind function is called twice, which is not necessary.

This PR moves the validate_data_input function outside of the data_kind, and calls it in Session.virtualfile_in explicitly. In this way, each function only focuses on one single data. which makes the functions easier to maintain.

I also don't like the current function design of the data_kind/validate_data_input functions, but I prefer to refactor the function details in a separate PR, to make this PR smaller for reviewing.

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
  • If adding new functionality, add an example to docstrings or tutorials.
  • Use underscores (not hyphens) in names of Python files and directories.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash command is:

  • /format: automatically format and lint the code

pygmt/helpers/utils.py Outdated Show resolved Hide resolved
@@ -193,15 +188,6 @@ def data_kind(data=None, x=None, y=None, z=None, required_z=False, required_data
kind = "matrix"
else:
kind = "vectors"
_validate_data_input(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed from here and moved to Session.virtualfile_in.

@@ -111,7 +115,9 @@ def _validate_data_input(
raise GMTInvalidInput("data must provide x, y, and z columns.")


def data_kind(data=None, x=None, y=None, z=None, required_z=False, required_data=True):
def data_kind(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this function only checks data, and x/y/z is no longer used.

@@ -32,25 +25,6 @@ def test_load_static_earth_relief():
assert isinstance(data, xr.DataArray)


@pytest.mark.parametrize(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is removed because:

  1. data_kind no longer checks if the input is valid
  2. All these tests are already covered by the validate_data_input doctests.

@seisman seisman marked this pull request as ready for review July 18, 2024 02:57
@seisman seisman added maintenance Boring but important stuff for the core devs skip-changelog Skip adding Pull Request to changelog needs review This PR has higher priority and needs review. run/benchmark Trigger the benchmark workflow in PRs labels Jul 18, 2024
@seisman seisman added this to the 0.13.0 milestone Jul 18, 2024
@@ -124,64 +130,53 @@ def data_kind(data=None, x=None, y=None, z=None, required_z=False, required_data
* 1-D arrays x and y (and z, optionally)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this line about x/y/z?

Suggested change
* 1-D arrays x and y (and z, optionally)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the docstring at b575591. Currently, any unrecognized data type is taken as a matrix type. It's not ideal, so I plan to refactor the data_kind function later, and will also polish the docstring when refactoring.

'arg'
>>> data_kind(data=xr.DataArray(np.random.rand(4, 3)))
'grid'
>>> data_kind(data=xr.DataArray(np.random.rand(3, 4, 5)))
'image'
"""
# determine the data kind
kind: Literal["arg", "file", "geojson", "grid", "image", "matrix", "vectors"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this line doing here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To suppress the mypy error:

pygmt/helpers/utils.py:191: error: Incompatible return value type (got "str", expected "Literal['arg', 'file', 'geojson', 'grid', 'image', 'matrix', 'vectors']")  [return-value]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting. Probably should turn this into an enum, but leave that for another day.

Comment on lines 122 to 123
Check what kind of data is provided to a module.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since x/y/z is no longer used, at L124-129, change the docstring to read:

    Possible types (provided to `data`):
    * a file name
    * a pathlib.PurePath object
    * an xarray.DataArray object
    * a 2-D matrix

@@ -124,64 +130,53 @@ def data_kind(data=None, x=None, y=None, z=None, required_z=False, required_data
* 1-D arrays x and y (and z, optionally)
* an optional argument (None, bool, int or float) provided as 'data'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* an optional argument (None, bool, int or float) provided as 'data'
* an optional argument (None, bool, int or float)

@weiji14
Copy link
Member

weiji14 commented Jul 20, 2024

Do you know why the test_velo_numpy_array_numeric_only test is failing on Python 3.10/NumPy 1.24?

@pytest.mark.mpl_image_compare
def test_velo_numpy_array_numeric_only(dataframe):
"""
Plot velocity arrow and confidence ellipse from a numpy.ndarray.
"""
fig = Figure()
fig.velo(
data=dataframe.iloc[:, :-1].to_numpy(),
spec="e0.2/0.39/18",
vector="0.3c+p1p+e+gred",
frame="1g1",
)
return fig

@seisman
Copy link
Member Author

seisman commented Jul 20, 2024

Do you know why the test_velo_numpy_array_numeric_only test is failing on Python 3.10/NumPy 1.24?

@pytest.mark.mpl_image_compare
def test_velo_numpy_array_numeric_only(dataframe):
"""
Plot velocity arrow and confidence ellipse from a numpy.ndarray.
"""
fig = Figure()
fig.velo(
data=dataframe.iloc[:, :-1].to_numpy(),
spec="e0.2/0.39/18",
vector="0.3c+p1p+e+gred",
frame="1g1",
)
return fig

It passes in the latest rerun.

@weiji14 weiji14 removed the run/benchmark Trigger the benchmark workflow in PRs label Jul 20, 2024
@seisman seisman merged commit 9ec92be into main Jul 20, 2024
20 checks passed
@seisman seisman deleted the refactor/data-kind branch July 20, 2024 05:40
@seisman seisman removed the needs review This PR has higher priority and needs review. label Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs skip-changelog Skip adding Pull Request to changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants