Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression: None in string input is wrongly cast to the "None" string #21083

Closed
pitrou opened this issue May 16, 2018 · 4 comments
Closed

Regression: None in string input is wrongly cast to the "None" string #21083

pitrou opened this issue May 16, 2018 · 4 comments
Labels
Blocker Blocking issue or pull request for an upcoming release Dtype Conversions Unexpected or buggy dtype conversions Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@pitrou
Copy link
Contributor

pitrou commented May 16, 2018

In Pandas 0.22.0:

>>> df = pd.DataFrame({'data': ['x', None]}, dtype=str)
>>> df['data'].tolist()
['x', None]

In Pandas 0.23.0:

>>> df = pd.DataFrame({'data': ['x', None]}, dtype=str)
>>> df['data'].tolist()
['x', 'None']
>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-20-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8

pandas: 0.23.0
pytest: 3.3.2
pip: 10.0.1
setuptools: 38.4.0
Cython: 0.28.2
numpy: 1.14.2
scipy: None
pyarrow: 0.9.1.dev14+g6599ab0.d20180323
xarray: None
IPython: 6.2.1
sphinx: 1.6.7
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
pitrou added a commit to pitrou/arrow that referenced this issue May 16, 2018
There is a regression (*) in Pandas 0.23.0 that breaks test_parquet.py.
Pin to 0.22.0 until the issue gets fixed upstream.

(*) pandas-dev/pandas#21083
@TomAugspurger TomAugspurger added Dtype Conversions Unexpected or buggy dtype conversions Regression Functionality that used to work in a prior pandas version labels May 16, 2018
@TomAugspurger TomAugspurger added this to the 0.23.1 milestone May 16, 2018
pitrou added a commit to pitrou/arrow that referenced this issue May 16, 2018
There is a regression (*) in Pandas 0.23.0 that breaks test_parquet.py.
Pin to 0.22.0 until the issue gets fixed upstream.

(*) pandas-dev/pandas#21083
@TomAugspurger
Copy link
Contributor

This sounds vaguely familiar... Interestingly, these two are different.

In [6]: pd.Series(['a', None]).values.tolist()
Out[6]: ['a', None]

In [7]: pd.Series(['a', None], dtype='str').values.tolist()
Out[7]: ['a', 'None']

@pitrou
Copy link
Contributor Author

pitrou commented May 16, 2018

Hmm, the "str" dtype blindly represents everything:

>>> pd.Series(['a', None, 0.4j, object()], dtype='str').values.tolist()
['a', 'None', '0.4j', '<object object at 0x7fc22cf8e6d0>']

On Pandas 0.22.0, however, the "str" dtype specification seems completely ignored:

>>> pd.Series(['a', None, 0.4j, object()], dtype='str').values.tolist()
['a', None, 0.4j, <object at 0x7fe5ddcd1e90>]

@TomAugspurger
Copy link
Contributor

I think we're hitting the else in

if is_object_dtype(dtype) and (is_list_like(subarr) and
, when we want to go down the if.

(Pdb) dtype
dtype('<U')
(Pdb) is_object_type(dtype)
*** NameError: name 'is_object_type' is not defined
(Pdb) dtype
dtype('<U')
(Pdb) is_object_dtype(dtype)

pitrou added a commit to pitrou/arrow that referenced this issue May 16, 2018
There is a regression (*) in Pandas 0.23.0 that breaks test_parquet.py.
Pandas does not have an actual "str" dtype anyway, so pass "object" instead.

(*) pandas-dev/pandas#21083
xhochy pushed a commit to apache/arrow that referenced this issue May 16, 2018
There is a regression (*) in Pandas 0.23.0 that breaks test_parquet.py.
Pandas does not have an actual "str" dtype anyway, so pass "object" instead.

(*) pandas-dev/pandas#21083

Author: Antoine Pitrou <[email protected]>

Closes #2051 from pitrou/ARROW-2589 and squashes the following commits:

b581ef3 <Antoine Pitrou> ARROW-2589:  Workaround regression in Pandas 0.23.0
@jreback jreback modified the milestones: 0.23.1, 0.23.2 Jun 7, 2018
@TomAugspurger TomAugspurger added the Blocker Blocking issue or pull request for an upcoming release label Jun 7, 2018
@TomAugspurger
Copy link
Contributor

This is a blocker for 0.23.1. Taking a look now.

@TomAugspurger TomAugspurger modified the milestones: 0.23.2, 0.23.1 Jun 7, 2018
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jun 7, 2018
```python
In [1]: import pandas as pd
In [2]: pd.Series([1, 2, None], dtype='str')[2]  # None

```

Closes pandas-dev#21083
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release Dtype Conversions Unexpected or buggy dtype conversions Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

3 participants