Possible string type detection bug #945

bkamins · 2021-12-04T18:17:11Z

@quinnj it seems that in some cases CSV.read creates String63 columns under default settings (and AFAICT it should not - String should be used in such a case as we are using InlineStrings.jl up to String31 by default).

Here is a notebook https://github.com/JuliaAcademy/DataFrames/blob/main/3.%20Working%20with%20text%20files.ipynb reproducing the issue. See cell [12] for the result and cell [10] for the command I use to read in the data. To get the data use the code in cell [5].

The text was updated successfully, but these errors were encountered:

bkamins · 2021-12-12T10:00:53Z

@quinnj I have the same problem in https://github.com/bkamins/PyDataGlobal2020.

See file https://github.com/bkamins/PyDataGlobal2020/blob/main/police.ipynb, cell [4]. We create String63, which should not happen.

Fixes #945. The core issue here is an inconsistency between the initial type detection code and the later-on type promotion code while parsing. The type detection code had a limit setup so that inline string column types weren't allowed larger than `String31`. The promotion code, however, allowed inline string types to continue to promote up to `String255`. That means if a column type was at least initiall detected as some smaller-than-`String31` type, then later on a value larger than 31 bytes was parsed, it would continue to promote up to larger inline string types. The change proposed in this PR is to limit the promotion code to be more consistent with the type detection code: if a parsed value "overflows" the `String31` type, we'll just promote to a regular `String` instead of promoting to larger inline string types.

quinnj mentioned this issue Dec 14, 2021

Avoid promoting inline string types larger than String31 #949

Merged

quinnj closed this as completed in #949 Dec 14, 2021

jd-foster mentioned this issue Apr 29, 2022

Appending Dataframes after CSV.read fails for different length String columns JuliaData/DataFrames.jl#3044

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible string type detection bug #945

Possible string type detection bug #945

bkamins commented Dec 4, 2021

bkamins commented Dec 12, 2021

Possible string type detection bug #945

Possible string type detection bug #945

Comments

bkamins commented Dec 4, 2021

bkamins commented Dec 12, 2021