Regression: None in string input is wrongly cast to the "None" string #21083

pitrou · 2018-05-16T13:36:29Z

In Pandas 0.22.0:

>>> df = pd.DataFrame({'data': ['x', None]}, dtype=str)
>>> df['data'].tolist()
['x', None]

In Pandas 0.23.0:

>>> df = pd.DataFrame({'data': ['x', None]}, dtype=str)
>>> df['data'].tolist()
['x', 'None']

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-20-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8

pandas: 0.23.0
pytest: 3.3.2
pip: 10.0.1
setuptools: 38.4.0
Cython: 0.28.2
numpy: 1.14.2
scipy: None
pyarrow: 0.9.1.dev14+g6599ab0.d20180323
xarray: None
IPython: 6.2.1
sphinx: 1.6.7
patsy: None
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 2.1.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

There is a regression (*) in Pandas 0.23.0 that breaks test_parquet.py. Pin to 0.22.0 until the issue gets fixed upstream. (*) pandas-dev/pandas#21083

TomAugspurger · 2018-05-16T13:42:48Z

This sounds vaguely familiar... Interestingly, these two are different.

In [6]: pd.Series(['a', None]).values.tolist()
Out[6]: ['a', None]

In [7]: pd.Series(['a', None], dtype='str').values.tolist()
Out[7]: ['a', 'None']

pitrou · 2018-05-16T13:45:29Z

Hmm, the "str" dtype blindly represents everything:

>>> pd.Series(['a', None, 0.4j, object()], dtype='str').values.tolist()
['a', 'None', '0.4j', '<object object at 0x7fc22cf8e6d0>']

On Pandas 0.22.0, however, the "str" dtype specification seems completely ignored:

>>> pd.Series(['a', None, 0.4j, object()], dtype='str').values.tolist()
['a', None, 0.4j, <object at 0x7fe5ddcd1e90>]

TomAugspurger · 2018-05-16T13:49:22Z

I think we're hitting the else in

pandas/pandas/core/series.py

Line 4044 in 501f041

if is_object_dtype(dtype) and (is_list_like(subarr) and

, when we want to go down the if.

(Pdb) dtype
dtype('<U')
(Pdb) is_object_type(dtype)
*** NameError: name 'is_object_type' is not defined
(Pdb) dtype
dtype('<U')
(Pdb) is_object_dtype(dtype)

There is a regression (*) in Pandas 0.23.0 that breaks test_parquet.py. Pandas does not have an actual "str" dtype anyway, so pass "object" instead. (*) pandas-dev/pandas#21083

There is a regression (*) in Pandas 0.23.0 that breaks test_parquet.py. Pandas does not have an actual "str" dtype anyway, so pass "object" instead. (*) pandas-dev/pandas#21083 Author: Antoine Pitrou <[email protected]> Closes #2051 from pitrou/ARROW-2589 and squashes the following commits: b581ef3 <Antoine Pitrou> ARROW-2589: Workaround regression in Pandas 0.23.0

TomAugspurger · 2018-06-07T18:11:08Z

This is a blocker for 0.23.1. Taking a look now.

```python In [1]: import pandas as pd In [2]: pd.Series([1, 2, None], dtype='str')[2] # None ``` Closes pandas-dev#21083

pitrou added a commit to pitrou/arrow that referenced this issue May 16, 2018

ARROW-2589: [CI] Avoid Pandas 0.23.0

9f76d35

There is a regression (*) in Pandas 0.23.0 that breaks test_parquet.py. Pin to 0.22.0 until the issue gets fixed upstream. (*) pandas-dev/pandas#21083

TomAugspurger added Dtype Conversions Unexpected or buggy dtype conversions Regression Functionality that used to work in a prior pandas version labels May 16, 2018

TomAugspurger added this to the 0.23.1 milestone May 16, 2018

pitrou added a commit to pitrou/arrow that referenced this issue May 16, 2018

ARROW-2589: [CI] Avoid Pandas 0.23.0

2c20c09

There is a regression (*) in Pandas 0.23.0 that breaks test_parquet.py. Pin to 0.22.0 until the issue gets fixed upstream. (*) pandas-dev/pandas#21083

pitrou mentioned this issue May 16, 2018

ARROW-2589: [Python] Workaround regression in Pandas 0.23.0 apache/arrow#2051

Closed

TomAugspurger mentioned this issue May 31, 2018

Potential regression in str dtype handling in 0.23? #21270

Closed

jreback modified the milestones: 0.23.1, 0.23.2 Jun 7, 2018

TomAugspurger added the Blocker Blocking issue or pull request for an upcoming release label Jun 7, 2018

TomAugspurger modified the milestones: 0.23.2, 0.23.1 Jun 7, 2018

TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Jun 7, 2018

REGR: NA-values in ctors with string dtype

d07b238

```python In [1]: import pandas as pd In [2]: pd.Series([1, 2, None], dtype='str')[2] # None ``` Closes pandas-dev#21083

TomAugspurger mentioned this issue Jun 7, 2018

REGR: NA-values in ctors with string dtype #21366

Merged

jreback closed this as completed in #21366 Jun 8, 2018

Veronur mentioned this issue Jun 15, 2020

BUG: apply() fails on some value types #34529

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression: None in string input is wrongly cast to the "None" string #21083

Regression: None in string input is wrongly cast to the "None" string #21083

pitrou commented May 16, 2018

TomAugspurger commented May 16, 2018

pitrou commented May 16, 2018 •

edited

Loading

TomAugspurger commented May 16, 2018

TomAugspurger commented Jun 7, 2018

Regression: None in string input is wrongly cast to the "None" string #21083

Regression: None in string input is wrongly cast to the "None" string #21083

Comments

pitrou commented May 16, 2018

TomAugspurger commented May 16, 2018

pitrou commented May 16, 2018 • edited Loading

TomAugspurger commented May 16, 2018

TomAugspurger commented Jun 7, 2018

pitrou commented May 16, 2018 •

edited

Loading