Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: ArgumentError: Reference array points beyond the end of the pool #495

Closed
xiaodaigh opened this issue Sep 10, 2019 · 8 comments
Closed

Comments

@xiaodaigh
Copy link
Contributor

Got this error when reading Fannie Mae Performance 2010Q3.txt on Windows 10 Julia 1.3-rc1

julia> @time a = CSV.read(
           "C:/data/Performance_All/Performance_2010Q3.txt",
               delim = '|',
           header = false,
           typemap = Dict(
               Int => Int32,
               Float64 => Float32,
               Union{Missing, Int} => Union{Missing, Int32},
               Union{Missing, Float64} => Union{Missing, Float32}
               ),
           copycols=true
       );
ERROR: ArgumentError: Reference array points beyond the end of the pool
Stacktrace:
 [1] PooledArrays.PooledArray(::PooledArrays.RefArray{Array{UInt32,1}}, ::Dict{Union{Missing, String},UInt32}, ::Array{Union{Missing, String},1}) at C:\Users\RTX2080\.julia\packages\PooledArrays\ufJSl\src\PooledArrays.jl:37
 [2] PooledArrays.PooledArray(::PooledArrays.RefArray{Array{UInt32,1}}, ::Dict{Union{Missing, String},UInt32}) at C:\Users\RTX2080\.julia\packages\PooledArrays\ufJSl\src\PooledArrays.jl:36
 [3] copy(::CSV.Column{Union{Missing, String},Union{Missing, PooledString}}) at C:\Users\RTX2080\.julia\dev\CSV\src\tables.jl:80
 [4] (::DataFrames.var"##DataFrame#91#94")(::Bool, ::Type{DataFrame}, ::Array{AbstractArray{T,1} where T,1}, ::DataFrames.Index) at C:\Users\RTX2080\.julia\packages\DataFrames\yH0f6\src\dataframe\dataframe.jl:130
 [5] Type at .\none:0 [inlined]
 [6] #fromcolumns#411(::Bool, ::typeof(DataFrames.fromcolumns), ::CSV.File) at C:\Users\RTX2080\.julia\packages\DataFrames\yH0f6\src\other\tables.jl:17
 [7] #fromcolumns at .\none:0 [inlined]
 [8] #DataFrame#412 at C:\Users\RTX2080\.julia\packages\DataFrames\yH0f6\src\other\tables.jl:32 [inlined]
 [9] Type at .\none:0 [inlined]
 [10] #read#65 at C:\Users\RTX2080\.julia\dev\CSV\src\CSV.jl:830 [inlined]
 [11] (::CSV.var"#kw##read")(::NamedTuple{(:delim, :header, :typemap, :copycols),Tuple{Char,Bool,Dict{Type,Type},Bool}}, ::typeof(CSV.read), ::String) at .\none:0
 [12] top-level scope at util.jl:155
@quinnj
Copy link
Member

quinnj commented Oct 8, 2019

I can't reproduce this in #510

@xiaodaigh
Copy link
Contributor Author

I am getting a new error on Julia 1.3-rc3

julia> @time a = CSV.read(
                  "C:/data/Performance_All/Performance_2010Q3.txt",
                                 delim = '|',
                  header = false,
                  typemap = Dict(
                      Int => Int32,
                      Float64 => Float32,
                      Union{Missing, Int} => Union{Missing, Int32},
                      Union{Missing, Float64} => Union{Missing, Float32}
                      ),
                  copycols=true
              );

the error is below

ERROR: TaskFailedException:
UndefRefError: access to undefined reference
Stacktrace:
 [1] getindex at .\array.jl:742 [inlined]
 [2] parsevalue!(::Type{Dates.Date}, ::Int8, ::Array{UInt64,1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Parsers.Options{false,true,false,Missing,UInt8,Nothing}, ::Int64, ::Int64, ::CSV.AtomicVector{Int8}, ::Array{Array{UInt64,1},1}) at C:\Users\RTX2080\.julia\packages\CSV\VybAe\src\CSV.jl:807
 [3] parsetape(::Val{false}, ::Bool, ::Int64, ::Dict{Int8,Int8}, ::Array{Array{UInt64,1},1}, ::Array{Array{UInt64,1},1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Int64, ::Nothing, ::Array{Int64,1}, ::Float64, ::Array{Dict{String,UInt64},1}, ::Array{UInt64,1}, ::Int64, ::CSV.AtomicVector{Int8}, ::Base.RefValue{Int64}, ::Bool, ::Parsers.Options{false,true,false,Missing,UInt8,Nothing}, ::Bool) at C:\Users\RTX2080\.julia\packages\CSV\VybAe\src\CSV.jl:541
 [4] macro expansion at C:\Users\RTX2080\.julia\packages\CSV\VybAe\src\CSV.jl:464 [inlined]
 [5] (::CSV.var"#33#35"{Array{UInt8,1},Parsers.Options{false,true,false,Missing,UInt8,Nothing},Float64,Int64,Bool,Dict{Type,Type},Int64,Nothing,Base.RefValue{Int64},Bool,Int64,Array{Int64,1},Int64,Array{Int64,1},Array{Array{Array{UInt64,1},1},1},Array{Array{Dict{String,UInt64},1},1},Array{Array{UInt64,1},1},Int64})() at .\threadingconstructs.jl:113
Stacktrace:
 [1] sync_end(::Array{Any,1}) at .\task.jl:300
 [2] macro expansion at .\task.jl:319 [inlined]
 [3] multithreadparse(::Array{Int8,1}, ::Array{UInt8,1}, ::Int64, ::Int64, ::Parsers.Options{false,true,false,Missing,UInt8,Nothing}, ::Int64, ::Float64, ::Int64, ::Bool, ::Dict{Type,Type}, ::Int64, ::Nothing, ::Base.RefValue{Int64}, ::Bool) at C:\Users\RTX2080\.julia\packages\CSV\VybAe\src\CSV.jl:455
 [4] file(::String, ::Bool, ::Bool, ::Int64, ::Nothing, ::Int64, ::Int64, ::Bool, ::Nothing, ::Bool, ::Bool, ::Array{String,1}, ::String, ::Char, ::Bool, ::Char, ::Nothing, ::Nothing, ::Char, ::Nothing, ::UInt8, ::Nothing, ::Nothing, ::Nothing, ::Nothing, ::Dict{Type,Type}, ::Bool, ::Float64, ::Bool, ::Bool, ::Nothing, ::Bool, ::Bool, ::Nothing) at C:\Users\RTX2080\.julia\packages\CSV\VybAe\src\CSV.jl:400
 [5] #File#15 at C:\Users\RTX2080\.julia\packages\CSV\VybAe\src\CSV.jl:213 [inlined]
 [6] (::Core.var"#kw#Type")(::NamedTuple{(:delim, :header, :typemap),Tuple{Char,Bool,Dict{Type,Type}}}, ::Type{CSV.File}, ::String) at .\none:0
 [7] #read#63 at C:\Users\RTX2080\.julia\packages\CSV\VybAe\src\CSV.jl:920 [inlined]
 [8] (::CSV.var"#kw##read")(::NamedTuple{(:delim, :header, :typemap, :copycols),Tuple{Char,Bool,Dict{Type,Type},Bool}}, ::typeof(CSV.read), ::String) at .\none:0
 [9] top-level scope at util.jl:155

@xiaodaigh
Copy link
Contributor Author

xiaodaigh commented Oct 8, 2019

I downloaded the file before you and my file is like 13GB in size.

@time a = CSV.read(
                  "C:/data/Performance_All/Performance_2010Q3.txt",
                                 delim = '|',
                  header = false,
                  typemap = Dict(
                      Int => Int32,
                      Float64 => Float32,
                      Union{Missing, Int} => Union{Missing, Int32},
                      Union{Missing, Float64} => Union{Missing, Float32}
                      ),
                  copycols=false
              );
 16.774572 seconds (8.88 M allocations: 647.145 MiB, 2.14% gc time)

Seems to work but a then crashes Julia

@quinnj
Copy link
Member

quinnj commented Oct 8, 2019

Hmmmm.........I just re-ran this 2010Q3 file that is 2.33GB, about 10 times in a row and didn't see any errors or anything.

@xiaodaigh
Copy link
Contributor Author

Same error. I tried it again just then.

@quinnj
Copy link
Member

quinnj commented Oct 12, 2019

I just pushed some fixes to #510, if you wouldn't mind trying again @xiaodaigh .

@xiaodaigh
Copy link
Contributor Author

xiaodaigh commented Oct 12, 2019

Fixed. Thanks!

See my tweet https://twitter.com/evalparse/status/1183035249481019394?s=20

@jeremiedb
Copy link

@quinnj I ran into that same Reference aaray point beyond end of the pool error with the latest CSV version (v0.8.4). Also tried with older v0.8.1 before what appears like update to PooledArrays but still got the same issues. The error as well on both Julia 1.6.1 and 1.5.3.

The puzzling part is that it is non-deterministic: the call succeed about once every 4 calls and the problem seems to be only present on my Windows 10 machine (but didn't run into issue on Linux).

julia> CSV.File(S3.get_object("xxxxx", "prod/target/europe.csv", Dict("response-content-type" => "application/octet-stream")));
ERROR: TaskFailedException

    nested task error: ArgumentError: Reference array points beyond the end of the pool
    Stacktrace:
     [1] PooledArrays.PooledVector{Union{Missing, String}, UInt32, SentinelArrays.ChainedVector{UInt32, Vector{UInt32}}}(rs::PooledArrays.RefArray{SentinelArrays.ChainedVector{UInt32, Vector{UInt32}}}, invpool::Dict{Union{Missing, String}, UInt32}, pool::Vector{Union{Missing, String}}, refcount::Base.Threads.Atomic{Int64})
       @ PooledArrays C:\Users\jerem\.julia\packages\PooledArrays\CV8kA\src\PooledArrays.jl:50
     [2] PooledArray (repeats 3 times)
       @ C:\Users\jerem\.julia\packages\PooledArrays\CV8kA\src\PooledArrays.jl:80 [inlined]
     [3] makeandsetpooled!(columns::Vector{AbstractVector{T} where T}, i::Int64, column::SentinelArrays.ChainedVector{UInt32, Vector{UInt32}}, refs::Vector{CSV.RefPool}, flags::Vector{UInt8})
       @ CSV C:\Users\jerem\.julia\packages\CSV\CJfFO\src\file.jl:416
     [4] macro expansion
       @ C:\Users\jerem\.julia\packages\CSV\CJfFO\src\file.jl:511 [inlined]
     [5] (::CSV.var"#36#39"{Vector{Type}, Vector{UInt8}, Vector{UInt8}, Parsers.Options{false, true, true, false, Missing, UInt8, Nothing}, Nothing, Float64, Vector{CSV.RefPool}, Int64, Dict{Type, Type}, DataType, Nothing, Int64, Vector{Int64}, Bool, Vector{AbstractVector{T} where T}, Vector{Int64}, Vector{Vector{AbstractVector{T} where T}}, Int64})()
       @ CSV .\threadingconstructs.jl:169

...and 2 more exceptions.

Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base .\task.jl:369
 [2] macro expansion
   @ .\task.jl:388 [inlined]
 [3] multithreadparse(types::Vector{Type}, flags::Vector{UInt8}, buf::Vector{UInt8}, options::Parsers.Options{false, true, true, false, Missing, UInt8, Nothing}, coloptions::Nothing, rowsguess::Int64, datarow::Int64, pool::Float64, refs::Vector{CSV.RefPool}, ncols::Int64, typemap::Dict{Type, Type}, customtypes::Type, limit::Nothing, N::Int64, ranges::Vector{Int64}, maxwarnings::Int64, debug::Bool)
   @ CSV C:\Users\jerem\.julia\packages\CSV\CJfFO\src\file.jl:463
 [4] CSV.File(h::CSV.Header{false, Parsers.Options{false, true, true, false, Missing, UInt8, Nothing}, Vector{UInt8}}; finalizebuffer::Bool, startingbyteposition::Nothing, endingbyteposition::Nothing, limit::Nothing, threaded::Nothing, typemap::Dict{Type, Type}, tasks::Int64, lines_to_check::Int64, maxwarnings::Int64, debug::Bool)
   @ CSV C:\Users\jerem\.julia\packages\CSV\CJfFO\src\file.jl:296
 [5] CSV.File(source::Vector{UInt8}; header::Int64, normalizenames::Bool, datarow::Int64, skipto::Nothing, footerskip::Int64, transpose::Bool, comment::Nothing, use_mmap::Nothing, ignoreemptylines::Bool, select::Nothing, drop::Nothing, missingstrings::Vector{String}, missingstring::String, delim::Nothing, ignorerepeated::Bool, quotechar::Char, openquotechar::Nothing, closequotechar::Nothing, escapechar::Char, dateformat::Nothing, dateformats::Nothing, decimal::UInt8, truestrings::Vector{String}, falsestrings::Vector{String}, type::Nothing, types::Nothing, typemap::Dict{Type, Type}, pool::Float64, lazystrings::Bool, strict::Bool, silencewarnings::Bool, debug::Bool, parsingdebug::Bool, kw::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ CSV C:\Users\jerem\.julia\packages\CSV\CJfFO\src\file.jl:218
 [6] CSV.File(source::Vector{UInt8})
   @ CSV C:\Users\jerem\.julia\packages\CSV\CJfFO\src\file.jl:217
 [7] top-level scope
   @ REPL[96]:1

Are there some further info that would help identify the underlying issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants