Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error parsing specific row in CSV File with multiple threads #713

Closed
junder873 opened this issue Aug 6, 2020 · 2 comments
Closed

Error parsing specific row in CSV File with multiple threads #713

junder873 opened this issue Aug 6, 2020 · 2 comments

Comments

@junder873
Copy link

I have been running into an error with parsing a CSV file, but only when using multiple threads. There appear to be two separate errors, a read-only error (generated by one part of the CSV file):

ERROR: TaskFailedException:
ReadOnlyMemoryError()
Stacktrace:
 [1] scale at C:\Users\junde\.julia\packages\Parsers\AF18A\src\floats.jl:332 [inlined]
 [2] scale at C:\Users\junde\.julia\packages\Parsers\AF18A\src\floats.jl:403 [inlined]
 [3] typeparser at C:\Users\junde\.julia\packages\Parsers\AF18A\src\floats.jl:277 [inlined]
 [4] typeparser at C:\Users\junde\.julia\packages\Parsers\AF18A\src\floats.jl:19 [inlined]
 [5] xparse at C:\Users\junde\.julia\packages\Parsers\AF18A\src\Parsers.jl:254 [inlined]
 [6] detect(::Array{AbstractArray{T,1} where T,1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Parsers.Options{false,false,true,false,Missing,UInt8,Nothing}, ::Int64, ::Int64, ::Int64, ::Dict{Type,Type}, ::Float64, ::Array{CSV.RefPool,1}, ::Bool, ::Array{Type,1}, ::Array{UInt8,1}, ::Int64) at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:730
 [7] parserow at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:639 [inlined]
 [8] parsefilechunk!(::Val{false}, ::Int64, ::Dict{Type,Type}, ::Array{AbstractArray{T,1} where T,1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Int64, ::Array{Int64,1}, ::Float64, ::Array{CSV.RefPool,1}, ::Int64, ::Int64, ::Array{Type,1}, ::Array{UInt8,1}, ::Bool, ::Parsers.Options{false,false,true,false,Missing,UInt8,Nothing}, ::Nothing, ::Type{Tuple{}}) at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:548
 [9] macro expansion at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:436 [inlined]
 [10] (::CSV.var"#34#39"{Array{Type,1},Array{UInt8,1},Array{UInt8,1},Parsers.Options{false,false,true,false,Missing,UInt8,Nothing},Nothing,Int64,Float64,Array{CSV.RefPool,1},Int64,Dict{Type,Type},DataType,Int64,Bool,Array{Int64,1},Int64,Array{Array{AbstractArray{T,1} where T,1},1},Array{Int64,1},Array{ReentrantLock,1},Int64})() at .\threadingconstructs.jl:169
Stacktrace:
 [1] sync_end(::Channel{Any}) at .\task.jl:314
 [2] macro expansion at .\task.jl:333 [inlined]
 [3] multithreadparse(::Array{Type,1}, ::Array{UInt8,1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Parsers.Options{false,false,true,false,Missing,UInt8,Nothing}, ::Nothing, ::Int64, ::Int64, ::Float64, ::Array{CSV.RefPool,1}, ::Int64, ::Dict{Type,Type}, ::Bool, ::Type{T} where T, ::Int64, ::Int64, ::Bool) at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:425
 [4] CSV.File(::CSV.Header{false,Parsers.Options{false,false,true,false,Missing,UInt8,Nothing},Array{UInt8,1}}; startingbyteposition::Nothing, endingbyteposition::Nothing, limit::Int64, threaded::Nothing, typemap::Dict{Type,Type}, tasks::Int64, debug::Bool) at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:254
 [5] CSV.File(::String; header::Int64, normalizenames::Bool, datarow::Int64, skipto::Nothing, footerskip::Int64, transpose::Bool, comment::Nothing, use_mmap::Nothing, ignoreemptylines::Bool, select::Nothing, drop::Nothing, missingstrings::Array{String,1}, missingstring::String, delim::Nothing, ignorerepeated::Bool, quotechar::Char, openquotechar::Nothing, closequotechar::Nothing, escapechar::Char, dateformat::Nothing, dateformats::Nothing, decimal::UInt8, truestrings::Array{String,1}, falsestrings::Array{String,1}, type::Nothing, types::Nothing, typemap::Dict{Type,Type}, categorical::Nothing, pool::Float64, lazystrings::Bool, strict::Bool, silencewarnings::Bool, debug::Bool, parsingdebug::Bool, kw::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:217
 [6] CSV.File(::String) at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:216
 [7] top-level scope at REPL[3]:1

and an "EXCEPTION_ACCESS_VIOLATION" error (generated by a different part):

thread = 1 warning: only found 66 / 150 columns around data row: 110. Filling remaining columns with `missing`

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x632ad29e -- thread = 5 warning: only found 78 / 150 columns around data row: 104. Filling remaining columns with `missing`
__gmpn_add_1 at /workspace/destdir/include\gmp.h:2155 [inlined]
mpfr_set4 at /workspace/srcdir/mpfr-4.0.2/src\set.c:59
in expression starting at REPL[3]:1
thread = 5 warning: only found 78 / 150 columns around data row: 104. Filling remaining columns with `missing`
__gmpn_add_1 at /workspace/destdir/include\gmp.h:2155 [inlined]
mpfr_set4 at /workspace/srcdir/mpfr-4.0.2/src\set.c:59
thread = 5 warning: only found 78 / 150 columns around data row: 104. Filling remaining columns with `missing`
mpfr_set_d at /workspace/srcdir/mpfr-4.0.2/src\set_d.c:240
thread = 5 warning: only found 78 / 150 columns around data row: 104. Filling remaining columns with `missing`
scale at C:\Users\junde\.julia\packages\Parsers\AF18A\src\floats.jl:332 [inlined]
scale at C:\Users\junde\.julia\packages\Parsers\AF18A\src\floats.jl:403 [inlined]
thread = 5 warning: only found 78 / 150 columns around data row: 104. Filling remaining columns with `missing`typeparser at C:\Users\junde\.julia\packages\Parsers\AF18A\src\floats.jl:277 [inlined]
typeparser at C:\Users\junde\.julia\packages\Parsers\AF18A\src\floats.jl:19 [inlined]
xparse at C:\Users\junde\.julia\packages\Parsers\AF18A\src\Parsers.jl:254 [inlined]
detect at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:730

parserow at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:639 [inlined]
parsefilechunk! at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:548
thread = 5 warning: only found 78 / 150 columns around data row: 104. Filling remaining columns with `missing`unknown function (ip: 000000001DFFF945)

macro expansion at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:436 [inlined]
#34 at .\threadingconstructs.jl:169
unknown function (ip: 000000001DFD2BCA)
thread = 5 warning: only found 78 / 150 columns around data row: 104. Filling remaining columns with `missing`
thread = 5 warning: only found 78 / 150 columns around data row: 104. Filling remaining columns with `missing`
thread = 5 warning: only found 78 / 150 columns around data row: 104. Filling remaining columns with `missing`
jl_apply at /cygdrive/d/buildbot/worker/package_win64/build/src\julia.h:1690 [inlined]
start_task at /cygdrive/d/buildbot/worker/package_win64/build/src\task.c:707
Allocations: 30348265 (Pool: 30340427; Big: 7838); GC: 31
thread = 5 warning: only found 78 / 150 columns around data row: 104. Filling remaining columns with `missing`
Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x668f0cf0 --
thread = 5 warning: only found 78 / 150 columns around data row: 104. Filling remaining columns with `missing`
jl_field_names at /cygdrive/d/buildbot/worker/package_win64/build/src\julia.h:944 [inlined]
jl_field_index at /cygdrive/d/buildbot/worker/package_win64/build/src\datatype.c:978
in expression starting at REPL[3]:1
jl_field_names at /cygdrive/d/buildbot/worker/package_win64/build/src\julia.h:944 [inlined]
jl_field_index at /cygdrive/d/buildbot/worker/package_win64/build/src\datatype.c:978
jl_f_getfield at /cygdrive/d/buildbot/worker/package_win64/build/src\builtins.c:780
matching_cache_argtypes at .\compiler\inferenceresult.jl:64
InferenceResult at .\compiler\inferenceresult.jl:12 [inlined]
InferenceResult at .\compiler\inferenceresult.jl:12 [inlined]
typeinf_ext at .\compiler\typeinfer.jl:568
typeinf_ext at .\compiler\typeinfer.jl:601
jfptr_typeinf_ext_20352.clone_1 at C:\Users\junde\AppData\Local\Programs\Julia 1.5.0\lib\julia\sys.dll (unknown line)
_jl_invoke at /cygdrive/d/buildbot/worker/package_win64/build/src\gf.c:2214 [inlined]
jl_apply_generic at /cygdrive/d/buildbot/worker/package_win64/build/src\gf.c:2398 [inlined]
jl_apply at /cygdrive/d/buildbot/worker/package_win64/build/src\julia.h:1690 [inlined]
jl_type_infer at /cygdrive/d/buildbot/worker/package_win64/build/src\gf.c:296
jl_generate_fptr at /cygdrive/d/buildbot/worker/package_win64/build/src\jitlayers.cpp:290
jl_compile_method_internal at /cygdrive/d/buildbot/worker/package_win64/build/src\gf.c:1964
jl_compile_method_internal at /cygdrive/d/buildbot/worker/package_win64/build/src\gf.c:1931 [inlined]
_jl_invoke at /cygdrive/d/buildbot/worker/package_win64/build/src\gf.c:2224 [inlined]
jl_apply_generic at /cygdrive/d/buildbot/worker/package_win64/build/src\gf.c:2398
jl_apply at /cygdrive/d/buildbot/worker/package_win64/build/src\julia.h:1690 [inlined]
jl_atexit_hook at /cygdrive/d/buildbot/worker/package_win64/build/src\init.c:230
jl_exit at /cygdrive/d/buildbot/worker/package_win64/build/src\jl_uv.c:624
jl_exception_handler at /cygdrive/d/buildbot/worker/package_win64/build/src\signals-win.c:308
__julia_personality at /cygdrive/d/buildbot/worker/package_win64/build/src/support\win32_ucontext.c:28
_chkstk at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
RtlRaiseException at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
KiUserExceptionDispatcher at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
__gmpn_add_1 at /workspace/destdir/include\gmp.h:2155 [inlined]
mpfr_set4 at /workspace/srcdir/mpfr-4.0.2/src\set.c:59
mpfr_set_d at /workspace/srcdir/mpfr-4.0.2/src\set_d.c:240
scale at C:\Users\junde\.julia\packages\Parsers\AF18A\src\floats.jl:332 [inlined]
scale at C:\Users\junde\.julia\packages\Parsers\AF18A\src\floats.jl:403 [inlined]
typeparser at C:\Users\junde\.julia\packages\Parsers\AF18A\src\floats.jl:277 [inlined]
typeparser at C:\Users\junde\.julia\packages\Parsers\AF18A\src\floats.jl:19 [inlined]
xparse at C:\Users\junde\.julia\packages\Parsers\AF18A\src\Parsers.jl:254 [inlined]
detect at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:730
parserow at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:639 [inlined]
parsefilechunk! at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:548
unknown function (ip: 000000001DFFF945)
macro expansion at C:\Users\junde\.julia\packages\CSV\UWuB2\src\file.jl:436 [inlined]
#34 at .\threadingconstructs.jl:169
unknown function (ip: 000000001DFD2BCA)
jl_apply at /cygdrive/d/buildbot/worker/package_win64/build/src\julia.h:1690 [inlined]
start_task at /cygdrive/d/buildbot/worker/package_win64/build/src\task.c:707
Allocations: 30348265 (Pool: 30340427; Big: 7838); GC: 31

This second error causes Julia to crash.

The file is written using the CSV package and there is no issue reading it with a single thread, which makes it confusing that it suggests there is a mismatch on the number of columns. Also, with the second error, it claims that there are at least 110 rows yet the CSV file only has 101 rows.

The errors are rather inconsistent, if I rewrite the file (using CSV and all the same data) the error sometimes changes. Rewriting the file sometimes makes the error not happen, which makes it hard to know if it is a problem with the specific row or some other issue.

Here is a link to one of the files that cause this issue.

@quinnj
Copy link
Member

quinnj commented Aug 8, 2020

I think this should be fixed by JuliaData/Parsers.jl#62

@junder873
Copy link
Author

Yes, updating Parsers.jl fixed this for me. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants