drop the length from `numpy`'s fixed-width string dtypes #9586

keewis · 2024-10-06T13:29:00Z

By converting arrays of fixed-width string / bytes dtypes to their base dtype (np.str_ and np.bytes_) in np.result_type, we can avoid accidentally truncating the replacement strings in xr.where.

While this works, I wonder if we instead should ask numpy to do this for us? I.e. np.result_dtype(np.dtype("<U1"), str) should return np.str_, not np.dtype("<U1").

Closes DataArray.where() can truncate strings with <U dtypes #9180
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst

shoyer

Looks good, thanks!

shoyer · 2024-10-10T09:34:33Z

While this works, I wonder if we instead should ask numpy to do this for us? I.e. np.result_dtype(np.dtype("<U1"), str) should return np.str_, not np.dtype("<U1").

Yes, this would be better in my opinion!

keewis · 2024-10-10T09:51:12Z

how do we proceed, then? Merge this (after fixing the failing min-deps CI), ask if numpy.result_type can be changed, and remove it once we can require a version of numpy that does this for us?

shoyer · 2024-10-10T13:19:45Z

Yes, that’s probably the way to go

…

On Thu, Oct 10, 2024 at 6:51 PM Justus Magin ***@***.***> wrote: how do we proceed, then? Merge this (after fixing the failing min-deps CI), ask if numpy.result_type can be changed, and remove it once we can require a version of numpy that supports this? — Reply to this email directly, view it on GitHub <#9586 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAJJFVSTJCPVMIHB7YF2QKTZ2ZE2LAVCNFSM6AAAAABPOKUAHKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBUGYZDIMZVGE> . You are receiving this because you commented.Message ID: ***@***.***>

keewis · 2024-10-15T12:35:05Z

Yes, this would be better in my opinion!

There's some concerns about this in numpy/numpy#27546

keewis · 2024-10-24T21:03:02Z

@TomNicholas, should we merge this before the release?

TomNicholas · 2024-10-24T21:05:00Z

Sure! If there is any doubt then leave it, but Stephan reviewed it so I say just merge.

keewis · 2024-10-24T21:07:09Z

the only doubt is about what should happen upstream in numpy (if anything should happen at all), so that shouldn't block us here

TomNicholas · 2024-10-24T21:08:09Z

I agree, let's merge.

* main: Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658)

* main: (85 commits) Refactor out utility functions from to_zarr (pydata#9695) Use the same function to floatize coords in polyfit and polyval (pydata#9691) Add `DataTree.persist` (pydata#9682) Typing annotations for arithmetic overrides (e.g., DataArray + Dataset) (pydata#9688) Raise `ValueError` for unmatching chunks length in `DataArray.chunk()` (pydata#9689) Fix inadvertent deep-copying of child data in DataTree (pydata#9684) new blank whatsnew (pydata#9679) v2024.10.0 release summary (pydata#9678) drop the length from `numpy`'s fixed-width string dtypes (pydata#9586) fixing behaviour for group parameter in `open_datatree` (pydata#9666) Use zarr v3 dimension_names (pydata#9669) fix(zarr): use inplace array.resize for zarr 2 and 3 (pydata#9673) implement `dask` methods on `DataTree` (pydata#9670) support `chunks` in `open_groups` and `open_datatree` (pydata#9660) Compatibility for zarr-python 3.x (pydata#9552) Update to_dataframe doc to match current behavior (pydata#9662) Reduce graph size through writing indexes directly into graph for ``map_blocks`` (pydata#9658) Add close() method to DataTree and use it to clean-up open files in tests (pydata#9651) Change URL for pydap test (pydata#9655) Fix multiple grouping with missing groups (pydata#9650) ...

keewis added 2 commits October 6, 2024 15:20

check that the length of fixed-width numpy strings is reset

1a0e56f

drop the length from numpy's fixed-width string dtypes

ed9e1b8

shoyer approved these changes Oct 10, 2024

View reviewed changes

keewis added 4 commits October 10, 2024 17:00

compatibility with numpy<2

4d8dcb0

use issubdtype instead

0faec84

some more test cases

a6dffe0

more details in the comment

d163934

Merge branch 'main' into fws-length

e15937f

Merge branch 'main' into fws-length

6213be1

TomNicholas enabled auto-merge (squash) October 24, 2024 21:14

TomNicholas merged commit fbe73ef into pydata:main Oct 24, 2024
28 checks passed

keewis deleted the fws-length branch October 24, 2024 21:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

drop the length from `numpy`'s fixed-width string dtypes #9586

drop the length from `numpy`'s fixed-width string dtypes #9586

keewis commented Oct 6, 2024

shoyer left a comment

shoyer commented Oct 10, 2024

keewis commented Oct 10, 2024 •

edited

Loading

shoyer commented Oct 10, 2024 via email

keewis commented Oct 15, 2024

keewis commented Oct 24, 2024

TomNicholas commented Oct 24, 2024

keewis commented Oct 24, 2024 •

edited

Loading

TomNicholas commented Oct 24, 2024

drop the length from numpy's fixed-width string dtypes #9586

drop the length from numpy's fixed-width string dtypes #9586

Conversation

keewis commented Oct 6, 2024

shoyer left a comment

Choose a reason for hiding this comment

shoyer commented Oct 10, 2024

keewis commented Oct 10, 2024 • edited Loading

shoyer commented Oct 10, 2024 via email

keewis commented Oct 15, 2024

keewis commented Oct 24, 2024

TomNicholas commented Oct 24, 2024

keewis commented Oct 24, 2024 • edited Loading

TomNicholas commented Oct 24, 2024

drop the length from `numpy`'s fixed-width string dtypes #9586

drop the length from `numpy`'s fixed-width string dtypes #9586

keewis commented Oct 10, 2024 •

edited

Loading

keewis commented Oct 24, 2024 •

edited

Loading