Give PyArray<PyObject> another try. #216

adamreichold · 2021-11-06T11:59:24Z

I am not sure what changed since #143 (comment) or whether the segmentation fault was only triggered by a more involved test but this seems to work using Python 3.8.2 on Linux. (Similarly to how #138 (comment) worked in this environment.)

Fixes #175

adamreichold · 2021-11-06T12:04:27Z

the segmentation fault was only triggered by a more involved test but this seems to work using Python 3.8.2 on Linux

Ah, it is not deterministic! Executed often enough, it is triggered...

adamreichold · 2021-11-06T12:06:31Z

And it also seems to trigger only if the other tests in the that binary are executed.

kngwyu · 2021-11-06T12:07:05Z

Yeah, I'm sorry but I also don't have much intuition about this bug. I just confirmed SIGSEGV is triggered in PyFinalizeEx using gdb.

adamreichold · 2021-11-06T12:11:53Z

It seems to be a double-free, i.e. if I add mem::forget(vec) to the test, the problems seems to go away. Could it be that .to_pyarray() should either take ownership or clone (and there INCREF) the items?

adamreichold · 2021-11-06T12:15:27Z

Yeah, ToPyArray goes via PyArray::from_slice which just does array.copy_ptr(slice.as_ptr(), slice.len());, i.e. copying the pointer values in case of objects but not increasing their element counts. I think the above should only happen for Copy types.

adamreichold · 2021-11-06T12:17:43Z

Oh, I think PyArray::from_slice is generally unsound as it copies any type T whether that is cloneable or not.

adamreichold · 2021-11-06T12:24:45Z

I pushed a fix for PyArray::from_slice even though it is probably slower for Copy types. Will look into specialisation next.

@kngwyu One thing unrelated I wonder is why PyArray::new is not unsafe? It does create an uninitialized array which would then allow me to access uninitialized memory, wouldn't it?

adamreichold · 2021-11-06T13:18:22Z

I pushed a fix for PyArray::from_slice even though it is probably slower for Copy types. Will look into specialisation next.

I found one more place where A: Element was assumed to imply A: Copy and copy_ptr was used in ToPyArray for ArrayBase.

However, I do not think ensuring a call to copy_nonoverlapping is possible without actual specialisation support on stable. The write-clone-into-offset-pointer idiom used now should give LLVM the opportunity to make that optimization though.

I also wrapped those partially initialized arrays into ManuallyDrop to avoid freeing uninitialized pointers but I think longer term, we'd want to support PyArray<MaybeUninit<T>> similar to ndarray.

kngwyu · 2021-11-06T14:38:07Z

Great

It seems to be a double-free, i.e. if I add mem::forget(vec) to the test, the problems seems to go away.

Ah, that's an awesome finding, thanks.

One thing unrelated I wonder is why PyArray::new is not unsafe? It does create an uninitialized array which would then allow me to access uninitialized memory, wouldn't it?

Yeah, your understanding is correct and it should be unsafe.

However, I do not think ensuring a call to copy_nonoverlapping is possible without actual specialisation support on stable. The write-clone-into-offset-pointer idiom used now should give LLVM the opportunity to make that optimization though.

I do think that throwing out copy_nonoverlapping is OK for now.

adamreichold · 2021-11-06T14:49:24Z

Yeah, your understanding is correct and it should be unsafe.

Opened #217 to discuss how to solve this as it seems unrelated to this change and should probably result in new API.

adamreichold · 2021-11-07T10:04:38Z

I do think that throwing out copy_nonoverlapping is OK for now.

I added another commit which takes care of properly leaking uninitialized PyObject arrays to avoid heap corruption which has the nice side effect of restoring that optimization since the other NumPy types are assumed to be Copy by NumPy itself anyway.

davidhewitt · 2021-11-07T10:41:30Z

src/array.rs

+        // all other data types are assumed to be `Copy` by NumPy
+        if T::DATA_TYPE == DataType::Object {
+            // keep array referenced as long as its contents is not initialized
+            let ref_ = mem::ManuallyDrop::new(array.to_object(py));


To avoid leaking memory if clone panics, you could use a guard pattern similar to https://github.com/PyO3/pyo3/blob/00c84eb0baec6f41623e83737a291d3e0d30cc5b/src/conversions/array.rs#L93

You would need to drop the initialized elements by hand and then zero the array so that numpy can deallocate safely I guess.

Certainly, however I prefer to postpone this to separate PR after we have reached soundness w.r.t PyArray creation.

👍 makes sense, let's just remember to open an issue for this when this PR merges.

I took a different approach for now:

// Use zero-initialized pointers for object arrays // so that partially initialized arrays can be dropped safely // in case the iterator implementation panics. let array = if T::DATA_TYPE == DataType::Object { Self::zeros(py, [iter.len()], false) } else { Self::new(py, [iter.len()], false) };

This way we start out with null pointers in the object case and the array is always safe to drop. I think the overhead is warranted due to the cost that arrays of pointers imply in any case. (Of course, using a guard that just zeros out uninitialized array elements if there actually is a panic can still be implemented as a follow-up optimization.)

Oh very nice!

Yeah, it looks a good temporal solution 👍🏼

…e::Object.

…ed memory if the iterator panics.

adamreichold · 2021-11-07T12:18:20Z

Sorry for the notification spam, but I wanted to add that I rebooted this patch series again to make the contract of trait Element more explicit and then do the minimal changes to make PyArray<PyObject> sound. This yields a much smaller diff without any intrusive changes affecting the existing code.

davidhewitt

Looks great to me, nice work! Could perhaps add extra tests for from_slice for PyObject vec etc

adamreichold · 2021-11-07T13:38:10Z

Could perhaps add extra tests for from_slice for PyObject vec etc

I suppose you mean extra test for impl<S, D, A> ToPyArray for ArrayBase<S, D> as PyArray::from_slice is what is exercised by the current test (via impl<T> ToPyArray for [T])?

Will add a test case for starting from an ndarray array...

kngwyu

Thanks!
This is a huge step.

@adamreichold
Just let me confirm: we can merge this PR as is and you want to refactor Element in a separate PR, right?

adamreichold · 2021-11-08T07:24:42Z

we can merge this PR as is and you want to refactor Element in a separate PR, right?

Yes, this PR is good to go in my opinion. I don't think we will need to touch trait Element, but I would like to work on a follow-up to give PyArray::new a different signature and documentation, i.e. work on #217

kngwyu · 2021-11-08T08:07:01Z

👍🏼

adamreichold mentioned this pull request Nov 6, 2021

Fix signature of PyArray::new or better yet replace it by PyArray::uninit #217

Closed

davidhewitt reviewed Nov 7, 2021

View reviewed changes

adamreichold added 3 commits November 7, 2021 13:02

Give PyArray<PyObject> another try.

9c6244b

Do not assume trivally copyable types for Element impls using DataTyp…

c8a2d8c

…e::Object.

Make PyArray::from_(exact_)iter sound by avoiding to free uninitializ…

f5fe2ba

…ed memory if the iterator panics.

adamreichold mentioned this pull request Nov 7, 2021

Add an example on how to use custom element types #218

Merged

davidhewitt approved these changes Nov 7, 2021

View reviewed changes

Add a test for the ToPyArray impl for ArrayBase.

b6e58e2

kngwyu approved these changes Nov 8, 2021

View reviewed changes

kngwyu merged commit b02c6df into PyO3:main Nov 8, 2021

adamreichold deleted the object-element branch November 8, 2021 15:29

Chuxiaof mentioned this pull request Aug 21, 2022

Add simple example for PyArray<PyObject>. #339

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Give PyArray<PyObject> another try. #216

Give PyArray<PyObject> another try. #216

adamreichold commented Nov 6, 2021 •

edited

Loading

adamreichold commented Nov 6, 2021

adamreichold commented Nov 6, 2021

kngwyu commented Nov 6, 2021

adamreichold commented Nov 6, 2021

adamreichold commented Nov 6, 2021

adamreichold commented Nov 6, 2021

adamreichold commented Nov 6, 2021

adamreichold commented Nov 6, 2021

kngwyu commented Nov 6, 2021

adamreichold commented Nov 6, 2021

adamreichold commented Nov 7, 2021

davidhewitt Nov 7, 2021

adamreichold Nov 7, 2021

davidhewitt Nov 7, 2021

adamreichold Nov 7, 2021

davidhewitt Nov 7, 2021

kngwyu Nov 8, 2021

adamreichold commented Nov 7, 2021 •

edited

Loading

davidhewitt left a comment

adamreichold commented Nov 7, 2021 •

edited

Loading

kngwyu left a comment

adamreichold commented Nov 8, 2021

kngwyu commented Nov 8, 2021

Give PyArray<PyObject> another try. #216

Give PyArray<PyObject> another try. #216

Conversation

adamreichold commented Nov 6, 2021 • edited Loading

adamreichold commented Nov 6, 2021

adamreichold commented Nov 6, 2021

kngwyu commented Nov 6, 2021

adamreichold commented Nov 6, 2021

adamreichold commented Nov 6, 2021

adamreichold commented Nov 6, 2021

adamreichold commented Nov 6, 2021

adamreichold commented Nov 6, 2021

kngwyu commented Nov 6, 2021

adamreichold commented Nov 6, 2021

adamreichold commented Nov 7, 2021

davidhewitt Nov 7, 2021

Choose a reason for hiding this comment

adamreichold Nov 7, 2021

Choose a reason for hiding this comment

davidhewitt Nov 7, 2021

Choose a reason for hiding this comment

adamreichold Nov 7, 2021

Choose a reason for hiding this comment

davidhewitt Nov 7, 2021

Choose a reason for hiding this comment

kngwyu Nov 8, 2021

Choose a reason for hiding this comment

adamreichold commented Nov 7, 2021 • edited Loading

davidhewitt left a comment

Choose a reason for hiding this comment

adamreichold commented Nov 7, 2021 • edited Loading

kngwyu left a comment

Choose a reason for hiding this comment

adamreichold commented Nov 8, 2021

kngwyu commented Nov 8, 2021

adamreichold commented Nov 6, 2021 •

edited

Loading

adamreichold commented Nov 7, 2021 •

edited

Loading

adamreichold commented Nov 7, 2021 •

edited

Loading