MAINT: Rewrite can-cast logic in terms of NEP 42 #17401

seberg · 2020-09-30T00:37:53Z

This PR is pretty big, we could probably split up some parts of it, e.g. the loops are not always used (although in large part tested if they are implemented)

The goal for this PR is currently to define all casts using the new machinery and add fairly thorough tests. Then we can put this in as functionality that is de-facto unused, but tested. A followup will then:

Use this to implement np.can_cast
Use it in the casting machinery.

Both of which should be limited changes after this is done (at least if we defer some optimization), but do change very central code in NumPy, so in the last dev meeting the preliminary plan was that we may defer changing this after the 1.20 release.

There is a lot to dissect in this PR, the basic design is that everything is stored on ArrayMethod objects (much like a ufunc loop+dtype resolver).

Some aspects, to draw attention to (although some of these should be clarified in NEP 43):

I decided in NEP 42 to return the casting safety, this is different from the fact that we currently pass in the safety we ask for. I like this design and it even adds
- This also means that casts should not report custom errors: I think this is fine for now. Datetimes do have custom errors, but do not seem to use them for np.can_cast (only for scalars probably).
We use move_references flag for handling references together with buffers. Right now, we can just ignore that (keep the "flag" around), but when making this public we may have to think about it, e.g. add additional flags to the ArrayMethod to signal that it can move references.

Checklist before merging:

Currently defaults to using the new system, that must be switched before merging (but means a full test/coverage run here), and probably the flag added to 2 CI runs or so.

numpy/core/include/numpy/ndarraytypes.h

seberg · 2020-10-19T21:37:11Z

OK, I realize that 4000 lines of new code is too big, but we have to start somewhere on this. Now that it is working as a drop in replacement for np.can_cast and at least those actual casts that are implemented are too some degree tested, it would be nice to get some feedback.

I don't like merging almost unused code, but one thing I could do is create a PR which only adds PyArrayMethod, etc. (and possibly a single ArrayMethod/cast), and then follow up with the rest of the changes, hopefully not in a single large PR.

So the question is what do you think, the only serious thing (aside from being a huge chunk of code) is probably the comment I put above around NPY_CASTING. Otherwise, this needs some eyes on:

The ArrayMethod design in array_method.h and array_method.c.
The changes in convert_datatype.c give an idea of how things are structured, which is probably useful to get a feel of the ArrayMethod.
Yes, is probably some cleanup still necessary in the current state, and a couple of new tests might be good, but that has little to do with the larger design.

Would it be possible to get some feedback on array_method.c and array_method.h (with an eye on an example in convert_datatype.c or the datetime.c casting implementations)? With that context I would be happy to split it out so we can merge it in somewhat more manageable chunks.

seberg · 2020-11-03T19:12:23Z

I probably will need gh-17706 to fix all the test failures (some new tests create problems). I removed the draft status, just in case that put anyone off from looking at this. This PR includes super important new infrastructure for new DTypes and I really need some review or any improvements here are mainly randomly kicking things and not too helpful.

But to be clear: There is some followup necessary probably, but overall it should be far enough that most of that followup can happen later. With two asides:

Testing both version in the CI matrix (I intentionally want the new version run the full test suit for now.
Any API question, the only real one being the NPY_CASTING that I individually commented above.

mattip

I did a high-level look through. I didn't see a way to break this into smaller PRs, except maybe the resolvers (which is not much code). On the other hand, being able to toggle this on and off will be helpful.

I think we should aim for making sure the API (and NEP 43) is correct since that will be harder to change in future releases, and merge this so it can make it into the 1.20 release. There should be more documentation, or places where this points to parts of NEP 43.

numpy/core/include/numpy/ndarraytypes.h

numpy/conftest.py

numpy/core/include/numpy/ndarraytypes.h

mattip · 2020-11-04T06:30:48Z

numpy/core/setup.py

@@ -23,6 +23,11 @@
 NPY_RELAXED_STRIDES_DEBUG = (os.environ.get('NPY_RELAXED_STRIDES_DEBUG', "0") != "0")
 NPY_RELAXED_STRIDES_DEBUG = NPY_RELAXED_STRIDES_DEBUG and NPY_RELAXED_STRIDES_CHECKING

+# Set to True to use the new casting implementation as much as implemented.
+# This allows running the full test suit and testing with the new
+# implementation. By default, use the new implementation only in release mode.


In the changelog entry you say something a little different

Suggested change

# implementation. By default, use the new implementation only in release mode.

# implementation. By default, this is None for this release of NumPy

mattip · 2020-11-04T06:32:11Z

numpy/core/setup.py

@@ -468,6 +473,11 @@ def generate_config_h(ext, build_dir):
            if NPY_RELAXED_STRIDES_DEBUG:
                moredefs.append(('NPY_RELAXED_STRIDES_DEBUG', 1))

+            # Use the new experimental casting implementation in NumPy 1.20:
+            if NPY_USE_NEW_CASTINGIMPL != "0" or (
+                    NPY_USE_NEW_CASTINGIMPL is None and not is_released(config)):


The dependence on is_released is confusing. There should be only one way to turn this on and off.

Suggested change

NPY_USE_NEW_CASTINGIMPL is None and not is_released(config)):

NPY_USE_NEW_CASTINGIMPL is None:

Ok, I will just set it to always of for now, we can just as well switch it to always on after branching 1.20. Hopefully we will delete the whole old branches fairly soon in any case.

numpy/core/src/multiarray/array_assign_array.c

mattip · 2020-11-04T06:50:13Z

numpy/core/src/multiarray/array_method.c

+    }
+
+    /* We find the common dtype of all inputs, and use it for the blanks */
+    assert(nin > 0);  /* this function is not used */


better to raise an error than to crash-only-on-debug

Made it an error, it should be rejected at registration time (in the FromSpec function) or indicates a misuse where a DType was defined but then is missing from the context (if a user defined an ArrayMethod with all of these dtypes, they must also be passed into the context).

numpy/core/src/multiarray/array_method.c

mattip · 2020-11-04T07:02:40Z

numpy/core/src/multiarray/datetime.c

+
+/*
+ * Describes casting within datetimes or timedelta
+ */


This is the heart of why the new design is so much better. Nice.

seberg · 2020-11-05T02:05:31Z

I had to squash everything, so the history is lost for now (I have a backup). The get_loop is now (almost) always NULL which removes 1000 lines or so of change (including some of the new tests). I addressed most of the current comments.

As a reminder, the only actual new public API is the change to NPY_CASTING, the important design choices that I think we are doing:

The general ArrayMethod design
The return value of resolve_descriptors (returning the casting values)
The get_loop function should exist, but the signature is absolutely open
The inner loop function signature is not even remotely set, I prefer to restructure casting and then fix the signature to be compatible to ufuncs right now.

seberg · 2020-11-10T01:48:53Z

Tests are all passing now. The doctest failures is real, it is because my code rejects np.can_cast("O", "V", casting="safe") (safe casting is default here, since it is not a structured void with a signle "O" field after all). The failure will thus go away if I switch the default back and only make a single job use the environment variable. I think it would be better to do that at the very end though.

Other than that, I currently only expect to add one or two tests based on what codecov says (although the bad coverage is largely due to untestable error code and the legacy code that is simply never used.

mattip · 2020-11-19T06:54:37Z

doctest is failing

seberg · 2020-11-19T15:19:48Z

@mattip yes, that is intentional right now. Because I wanted the full test suit to run with the changes. But there is this one small change (e.g. code coverage). But will change it later hopefully (or as soon as you show intention of no/few further fixups).

EDIT: Based on codecov, I also wanted to add 1-2 tests, although a lot of what it complains seems hard to trigger error paths.

charris · 2020-11-24T23:36:27Z

Not quite ready yet?

seberg · 2020-11-25T00:48:20Z

Was just doing the last update, then run through tests and change the flag. Will finish tonight.

Casting from object uses inspection logic, so doesn't actually end up in this path, and thus will not use (arguably incorrectly) reuse the itemsize of the object dtype in any case.

Lets defer further touch ups to later... One more run, since the last one errored (hopefully due to an old failure not merged correctly)

seberg · 2020-11-25T05:21:15Z

OK, should have flipped the switch back and one random azure run hopefully including it. (So I think it should be OK to go in, I am sure there will be smaller changes, but those might as well happen later)

seberg · 2020-11-25T15:58:29Z

OK, tests are fine. Please ignore the coverage, I manually looked that it was pretty good (aside from some of the code in array_method.c which is simply not used much yet. There is simply a lot of dead code since the new cast logic is only well exercised if the NPY_USE_NEW_CASTINGIMPL=1 is set at compile time.

charris · 2020-11-25T16:10:53Z

@mattip Ready?

mattip

I guess we should put this in, even though the chances it gets used during the 1.20 release cycle are slim. I am still not 100% happy with the API but as @seberg says it is all internal so we are free to modify it as we go.

mattip · 2020-11-16T08:44:37Z

numpy/core/src/multiarray/array_method.h

+
+    PyArray_DTypeMeta **dtypes;
+    /* Operand descriptors, filled in by adjust_desciptors */
+    PyArray_Descr **descriptors;


It would be nice to give dtypes and descriptors less generic names. It still bothers me that both are input when really only one or the other is needed. In any case, in NEP 43, the dtypes field is capitalized Dtypes.

Oh, I missed changing it in this PR already, the intention is to modify this to descriptors now, since it is not passed to resolve_descriptors.

charris · 2020-11-25T17:27:39Z

OK, in it goes. If nothing else, later patches should be smaller...

charris · 2020-11-25T17:28:01Z

Thanks Sebastian.

seberg · 2020-11-25T17:33:11Z

Yes, there will be API wiggling needed before exposure... But, on the up-side it is probably easier to get a feel for those things once the next step (and maybe the ufunc changes) are in the pipeline.

seberg · 2020-11-25T17:33:42Z

Thanks Matti! I know this is tough to move forward, and a large tricky project.

seberg marked this pull request as draft September 30, 2020 00:38

github-actions bot added the 25 - WIP label Sep 30, 2020

seberg force-pushed the restructure-casting branch 2 times, most recently from f3475c6 to 74a699c Compare October 17, 2020 00:58

seberg commented Oct 17, 2020

View reviewed changes

numpy/core/include/numpy/ndarraytypes.h Show resolved Hide resolved

seberg force-pushed the restructure-casting branch from 14e992e to 5649cd0 Compare October 17, 2020 05:21

seberg force-pushed the restructure-casting branch from f733a9c to 590a006 Compare October 26, 2020 15:34

seberg force-pushed the restructure-casting branch from 5577b77 to 6a7f48b Compare November 3, 2020 01:57

seberg marked this pull request as ready for review November 3, 2020 16:57

mattip reviewed Nov 4, 2020

View reviewed changes

seberg force-pushed the restructure-casting branch from 1376750 to 17fb0a9 Compare November 5, 2020 00:55

seberg changed the title ~~WIP: Implement casting using a new ArrayMethod to structure casting and ufuncs~~ MAINT: Rewrite can-cast logic in terms of NEP 42 Nov 5, 2020

github-actions bot added the 03 - Maintenance label Nov 5, 2020

seberg force-pushed the restructure-casting branch 2 times, most recently from 9636312 to d1d5bfc Compare November 5, 2020 02:00

seberg force-pushed the restructure-casting branch from d1d5bfc to 7f6f70c Compare November 10, 2020 00:25

mattip added this to the 1.20.0 release milestone Nov 18, 2020

seberg force-pushed the restructure-casting branch from 89f60a4 to 30e7582 Compare November 25, 2020 01:37

seberg added 4 commits November 24, 2020 21:25

MAINT: Rewrite can-cast logic in terms of NEP 42

3cfcd22

TST: Fixup tests for Void

df1b2a8

Casting from object uses inspection logic, so doesn't actually end up in this path, and thus will not use (arguably incorrectly) reuse the itemsize of the object dtype in any case.

Address Matti's comments from yesterday

a9a44e9

Last touch-ups (test and tiny fixes)

39d2e8b

Lets defer further touch ups to later... One more run, since the last one errored (hopefully due to an old failure not merged correctly)

seberg force-pushed the restructure-casting branch from 30e7582 to 39d2e8b Compare November 25, 2020 03:36

CI: Activate new castingimpl on no-openblas azure job

a806c21

mattip reviewed Nov 25, 2020

View reviewed changes

charris merged commit ba77419 into numpy:master Nov 25, 2020

seberg deleted the restructure-casting branch November 25, 2020 17:37

This was referenced Nov 30, 2021

boost::python::numpy::initialize() incompatibilities with numpy>=1.21 boostorg/python#376

Closed

Segmentation faults with numpy 1.21 package cctbx/cctbx_project#627

Open

bkpoon mentioned this pull request Dec 1, 2021

Boost.Python and segmentation faults with numpy 1.21 conda-forge/boost-feedstock#127

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: Rewrite can-cast logic in terms of NEP 42 #17401

MAINT: Rewrite can-cast logic in terms of NEP 42 #17401

seberg commented Sep 30, 2020 •

edited

Loading

seberg commented Oct 19, 2020

seberg commented Nov 3, 2020

mattip left a comment

mattip Nov 4, 2020

mattip Nov 4, 2020

seberg Nov 4, 2020

mattip Nov 4, 2020

seberg Nov 5, 2020

mattip Nov 4, 2020

seberg commented Nov 5, 2020

seberg commented Nov 10, 2020

mattip commented Nov 19, 2020

seberg commented Nov 19, 2020 •

edited

Loading

charris commented Nov 24, 2020

seberg commented Nov 25, 2020

seberg commented Nov 25, 2020 •

edited

Loading

seberg commented Nov 25, 2020

charris commented Nov 25, 2020

mattip left a comment

mattip Nov 16, 2020

seberg Nov 25, 2020

charris commented Nov 25, 2020

charris commented Nov 25, 2020

seberg commented Nov 25, 2020

seberg commented Nov 25, 2020

	# implementation. By default, use the new implementation only in release mode.
	# implementation. By default, this is None for this release of NumPy

	NPY_USE_NEW_CASTINGIMPL is None and not is_released(config)):
	NPY_USE_NEW_CASTINGIMPL is None:

MAINT: Rewrite can-cast logic in terms of NEP 42 #17401

MAINT: Rewrite can-cast logic in terms of NEP 42 #17401

Conversation

seberg commented Sep 30, 2020 • edited Loading

seberg commented Oct 19, 2020

seberg commented Nov 3, 2020

mattip left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seberg commented Nov 5, 2020

seberg commented Nov 10, 2020

mattip commented Nov 19, 2020

seberg commented Nov 19, 2020 • edited Loading

charris commented Nov 24, 2020

seberg commented Nov 25, 2020

seberg commented Nov 25, 2020 • edited Loading

seberg commented Nov 25, 2020

charris commented Nov 25, 2020

mattip left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charris commented Nov 25, 2020

charris commented Nov 25, 2020

seberg commented Nov 25, 2020

seberg commented Nov 25, 2020

seberg commented Sep 30, 2020 •

edited

Loading

seberg commented Nov 19, 2020 •

edited

Loading

seberg commented Nov 25, 2020 •

edited

Loading