BUG: apply() fails on some value types #34529

simondsmart · 2020-06-02T12:51:59Z

We have some existing code that manipulates data that is decoded into numpy arrays (by a C powered backend). This code has stopped working.

I've tried to strip it down to a reduced case

import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([b'abcd', b'efgh']), columns=['col'])
df.apply(lambda x: x.astype('object'))

This fails with an error inside an internal function of apply:

ValueError                                Traceback (most recent call last)
<ipython-input-88-a5fa9cabd101> in <module>
----> 1 df.apply(lambda x: x.astype('object'))

~/local/pkg/miniconda3/envs/odc/lib/python3.8/site-packages/pandas/core/frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
   6876             kwds=kwds,
   6877         )
-> 6878         return op.get_result()
   6879 
   6880     def applymap(self, func) -> "DataFrame":

~/local/pkg/miniconda3/envs/odc/lib/python3.8/site-packages/pandas/core/apply.py in get_result(self)
    184             return self.apply_raw()
    185 
--> 186         return self.apply_standard()
    187 
    188     def apply_empty_result(self):

~/local/pkg/miniconda3/envs/odc/lib/python3.8/site-packages/pandas/core/apply.py in apply_standard(self)
    293 
    294             try:
--> 295                 result = libreduction.compute_reduction(
    296                     values, self.f, axis=self.axis, dummy=dummy, labels=labels
    297                 )

pandas/_libs/reduction.pyx in pandas._libs.reduction.compute_reduction()

pandas/_libs/reduction.pyx in pandas._libs.reduction.Reducer.__init__()

pandas/_libs/reduction.pyx in pandas._libs.reduction.Reducer._check_dummy()

ValueError: Dummy array must be same dtype

If we use a different dtype, then it works.

df = pd.DataFrame(np.array(['abcd', 'efgh']), columns=['col'])
df.apply(lambda x: x.astype('object'))
print(df)

which gives the expected result

    col
0  abcd
1  efgh

The text was updated successfully, but these errors were encountered:

Veronur · 2020-06-03T20:33:36Z

hello i would like to look into this issue!

jorisvandenbossche · 2020-06-03T20:49:35Z

@simondsmart thanks for the report. That's indeed an error that should not be seen by the user (and in 0.25 it was working)

Now, although it should not raise an error, it's also not fully clear to me what you are trying to achieve. The column in the dataframe already has a object dtype, so doing apply(lambda x: x.astype('object')) should basically be a no-op.

@Veronur always welcome to take a look!

simondsmart · 2020-06-04T23:30:20Z

@jorisvandenbossche the code came from a rather different context. We have a library that does decoding of a rather esoteric data type (ODB2, the pyodc library). It has an optional ability to offload to a (separate) C++ library that does the decoding much faster - but requires that we set up arrays with rather strict memory layout requirements to decode into.

We then have to do a bit of ... coercion ... to get back to something appropriate in python land.

The bug report was rather aggressively simplified to the simplest case I could make to trigger the same error. So it looks like something rather daft.

Many thanks for your help!

Veronur · 2020-06-06T15:45:29Z

so, i did some checking and this is what i found out so far: the problem as shown above happens inside of the libreduction.compute_reduction function. So i checked its arguments and found out the following: the dtypes for the dummy variables just like for nomal strings(object). But the problem seems to be that for the arr variable the dtype is |S# (# being the number of chars on the byte string).

I think that making those two equal would solve the problem so i would like sugestions on how to do that.
Thanks

Veronur · 2020-06-08T21:16:40Z

Ok so what i found out is this: the example you give is instead of using df.apply(lambda x: x.astype('object')) you do df.apply(lambda x: x.astype('|S')) it the command will run beacuse currently pandas is intepreting strings as an object itself but fo byteStrings it gets an array_interface https://numpy.org/devdocs/reference/arrays.interface.html#arrays-interface so thats why the dtypes were diferent. If this is unexpected behavour id like some guidelines on how to make a fix for it and if this is supposed to happen id be happy to make some test for it!
By the way, it works from object to |S# but not from |S# to object

TomAugspurger · 2020-06-12T16:34:31Z

Thanks for looking into this @Veronur. I'm also not sure the best way to handle it, but it'd be nice to fix the regression for the 1.1 release (in a few weeks) as long as we don't give up on other behaviors..

Veronur · 2020-06-12T23:56:48Z

Alright i have done some more digging and i ma getting close to to source of the problem, so bar i found out that the dtype is changed from |S# to object on the Series generation for the dummy array on the " generic.NDFrame.init(self, data) " fuction call. There the values array is generated with object dtype instead of |S# as expected.

Veronur · 2020-06-15T23:52:40Z

Alright i found the problem and made a fix for it. The problem with on the pandas/core/dtypes/cast.py file. It was made on issue #21083 but since python3 "U" types and "S" types became different things

Veronur · 2020-06-16T00:49:44Z

The pull request is having some issues now with some tests that use the None type because thats what
issue #21083 fixed. I will need some ideas on how to manage those.

correction and test for issue-pandas-dev#34529

correction and test for issue-pandas-dev#34529 made the formating changes

correction and test for issue-pandas-dev#34529 made the formating changes fixing tests on issue-pandas-dev#34529

correction and test for issue-pandas-dev#34529 made the formating changes fixing tests on issue-pandas-dev#34529 add whats new entry on issue-pandas-dev#34539

correction and test for issue-pandas-dev#34529 made the formating changes fixing tests on issue-pandas-dev#34529 add whats new entry on issue-pandas-dev#34539 add whats new entry correction issue-pandas-dev#34539

correction and test for issue-pandas-dev#34529 made the formating changes fixing tests on issue-pandas-dev#34529 add whats new entry on issue-pandas-dev#34539 add whats new entry correction issue-pandas-dev#34539 whats new correction issue-pandas-dev#34539

jorisvandenbossche added Apply Apply, Aggregate, Transform, Map Regression Functionality that used to work in a prior pandas version labels Jun 3, 2020

jorisvandenbossche added this to the 1.1 milestone Jun 3, 2020

Veronur mentioned this issue Jun 15, 2020

BUG: apply() fails on some value types #34812

Merged

5 tasks

jreback changed the title ~~apply() fails on some value types~~ BUG: apply() fails on some value types Jun 16, 2020

Veronur pushed a commit to Veronur/pandas that referenced this issue Jun 16, 2020

test for issue-pandas-dev#34529

4967f14

Veronur pushed a commit to Veronur/pandas that referenced this issue Jun 16, 2020

correction on test for issue-pandas-dev#34529

3cc40f2

Veronur pushed a commit to Veronur/pandas that referenced this issue Jun 16, 2020

GH34529

4ab85ef

correction and test for issue-pandas-dev#34529

Veronur pushed a commit to Veronur/pandas that referenced this issue Jun 16, 2020

GH34529

685b3c5

correction and test for issue-pandas-dev#34529 made the formating changes

Veronur pushed a commit to Veronur/pandas that referenced this issue Jun 16, 2020

GH34529

eb4c5fb

correction and test for issue-pandas-dev#34529 made the formating changes fixing tests on issue-pandas-dev#34529

Veronur pushed a commit to Veronur/pandas that referenced this issue Jun 17, 2020

GH34529

59a5a95

correction and test for issue-pandas-dev#34529 made the formating changes fixing tests on issue-pandas-dev#34529 add whats new entry on issue-pandas-dev#34539

Veronur added a commit to Veronur/pandas that referenced this issue Jun 18, 2020

Merge branch 'master' into issue-pandas-dev#34529

099ed61

jreback closed this as completed in #34812 Jun 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: apply() fails on some value types #34529

BUG: apply() fails on some value types #34529

simondsmart commented Jun 2, 2020

Veronur commented Jun 3, 2020

jorisvandenbossche commented Jun 3, 2020

simondsmart commented Jun 4, 2020

Veronur commented Jun 6, 2020

Veronur commented Jun 8, 2020 •

edited

Loading

TomAugspurger commented Jun 12, 2020

Veronur commented Jun 12, 2020

Veronur commented Jun 15, 2020

Veronur commented Jun 16, 2020

BUG: apply() fails on some value types #34529

BUG: apply() fails on some value types #34529

Comments

simondsmart commented Jun 2, 2020

Veronur commented Jun 3, 2020

jorisvandenbossche commented Jun 3, 2020

simondsmart commented Jun 4, 2020

Veronur commented Jun 6, 2020

Veronur commented Jun 8, 2020 • edited Loading

TomAugspurger commented Jun 12, 2020

Veronur commented Jun 12, 2020

Veronur commented Jun 15, 2020

Veronur commented Jun 16, 2020

Veronur commented Jun 8, 2020 •

edited

Loading