Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: could not create file mapping: The operation comleted successfully #424

Closed
zjpi opened this issue May 6, 2019 · 13 comments
Closed

Comments

@zjpi
Copy link

zjpi commented May 6, 2019

I tried to read this example file

download("https://github.com/xiaodaigh/testing/raw/master/Performance_2003Q3.zip", "ok.zip")
run(`unzip -o ok.zip`)
using CSV
path = "Performance_2003Q3.txt"
@time a = CSV.read(path, delim = '|', header = 0); 

I get the following error

ERROR: could not create file mapping: The operation completed successfully.
Stacktrace:
 [1] error(::String) at .\error.jl:33
 [2] #mmap#1(::Bool, ::Bool, ::Function, ::Mmap.Anonymous, ::Type{Array{UInt64,1}}, ::Tuple{Int64}, ::Int64) at C:\Users\RTX2080\AppData\Local\Julia-1.1.0\share\julia\stdlib\v1.1\Mmap\src\Mmap.jl:218
 [3] #mmap at .\none:0 [inlined]
 [4] #mmap#14 at C:\Users\RTX2080\AppData\Local\Julia-1.1.0\share\julia\stdlib\v1.1\Mmap\src\Mmap.jl:251 [inlined]
 [5] mmap at C:\Users\RTX2080\AppData\Local\Julia-1.1.0\share\julia\stdlib\v1.1\Mmap\src\Mmap.jl:251 [inlined]
 [6] #File#20(::Int64, ::Bool, ::Int64, ::Nothing, ::Int64, ::Int64, ::Bool, ::Nothing, ::Bool, ::Array{String,1}, ::String, ::Char, ::Bool, ::Char, ::Nothing, ::Nothing, ::Char, ::Nothing, ::UInt8, ::Nothing, ::Nothing, ::Nothing, ::Nothing, ::Dict{Int8,Int8}, ::Bool, ::Float64, ::Bool, ::Bool, ::Bool, ::Bool, ::Nothing, ::Type, ::String) at C:\Users\RTX2080\.julia\packages\CSV\2IO2Z\src\CSV.jl:227
 [7] (::getfield(Core, Symbol("#kw#Type")))(::NamedTuple{(:delim, :header),Tuple{Char,Int64}}, ::Type{CSV.File}, ::String) at .\none:0
 [8] #read#63 at C:\Users\RTX2080\.julia\packages\CSV\2IO2Z\src\CSV.jl:559 [inlined]
 [9] (::getfield(CSV, Symbol("#kw##read")))(::NamedTuple{(:delim, :header),Tuple{Char,Int64}}, ::typeof(CSV.read), ::String) at .\none:0
 [10] top-level scope at util.jl:156
@quinnj
Copy link
Member

quinnj commented May 6, 2019

Can you share your system specs? os? ram? etc? On my system, (osx, Julia 1.1, 16gb ram) the file reads, it takes about 365 seconds. Uncompressed, the file is 12.5gb.

@xiaodaigh
Copy link
Contributor

Windows 10 Pro. 64G RAM. Julia 1.1 CSV.jl 0.5

@quinnj
Copy link
Member

quinnj commented May 6, 2019

Hmmm, this might be a windows issue then, I'll have to dig out my windows box tonight and test some things. That mmap error looks suspicious saying it's an error, but operation completed successfully.

For reference, data.table fread was unable to read a file this big (at least on my machine, it failed after 20 seconds with a "vector memory exhausted" error), TableReader.jl took 1711 seconds, and I had to stop TextParse.jl after about an hour and it hadn't finished yet (but was slowing down my system as to be unuseable).

@quinnj
Copy link
Member

quinnj commented May 7, 2019

Ok, I can reproduce the error on my windows machine w/ 16gb ram. Playing around w/ just mmap, I can't seem to do an anonymous mmap over about 10.5gb, and if I allocate several, I can't even get that much. This stack overflow sounds a lot like what we're seeing, but doesn't give an immediate answer. Do we try to create a file full of null bytes the size that we need and then mmap that? That's annoying because we're actually taking up disk space w/ that temporary file and hopefully can manage to delete it when we're done. Maybe there's some way we can allocate smaller chunks up to what we need and concat them together somehow? That seems annoying, but more possible. I'll keep researching this.

quinnj added a commit that referenced this issue May 7, 2019
… lot faster for large files than going through normal IO. Helps a little for #424
quinnj added a commit that referenced this issue May 7, 2019
* If the source isn't an IO, let's mmap to a new mmapped buffer; it's a lot faster for large files than going through normal IO. Helps a little for #424

* Only define our grisu method for our own Floats. This was inadvertently working on Dec64, which isn't valid
@quinnj
Copy link
Member

quinnj commented May 7, 2019

I merged #426 which might help a little, but probably isn't a full solution to this issue. I wonder if a 64gb ram system would work now though; if anyone wants to try #master branch, that'd be great. Otherwise, I'll keep noodling on the possibility of how to do a chunked array approach.

@xiaodaigh
Copy link
Contributor

I works now

@zjpi zjpi closed this as completed May 22, 2019
@quinnj
Copy link
Member

quinnj commented May 22, 2019

Oh good! I can still generate scenarios where a memory-constrained windows box will fail, but I'm glad that #426 at least helps when the machine has sufficient RAM.

@xiaodaigh
Copy link
Contributor

The problem is now back. The first time I run it, it works fine, the second time it errors

@time a = CSV.read(path; header=0, limit=3)
@time a = CSV.read(path; header=0, limit=3)

the error is

julia> @time a = CSV.read(path; header=0, limit=3)
ERROR: could not create file mapping: The operation completed successfully.
Stacktrace:
 [1] error(::String) at .\error.jl:33
 [2] #mmap#1(::Bool, ::Bool, ::Function, ::Mmap.Anonymous, ::Type{Array{UInt8,1}}, ::Tuple{Int64}, ::Int64) at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Mmap\src\Mmap.jl:218
 [3] #mmap#14 at .\none:0 [inlined]
 [4] mmap at C:\cygwin\home\Administrator\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.1\Mmap\src\Mmap.jl:251 [inlined]
 [5] getsource(::String, ::Bool) at C:\Users\RTX2080\.julia\packages\CSV\xJZKC\src\utils.jl:159
 [6] file(::String, ::Int64, ::Bool, ::Int64, ::Nothing, ::Int64, ::Int64, ::Bool, ::Nothing, ::Bool, ::Bool, ::Array{String,1}, ::String, ::Nothing, ::Bool, ::Char, ::Nothing, ::Nothing, ::Char, ::Nothing, ::UInt8, ::Nothing, ::Nothing, ::Nothing, ::Nothing, ::Dict{Int8,Int8}, ::Bool, ::Float64, ::Bool, ::Bool, ::Bool, ::Bool, ::Nothing) at C:\Users\RTX2080\.julia\packages\CSV\xJZKC\src\CSV.jl:237
 [7] #File#21 at C:\Users\RTX2080\.julia\packages\CSV\xJZKC\src\CSV.jl:160 [inlined]
 [8] (::getfield(Core, Symbol("#kw#Type")))(::NamedTuple{(:header, :limit),Tuple{Int64,Int64}}, ::Type{CSV.File}, ::String) at .\none:0
 [9] #read#61 at C:\Users\RTX2080\.julia\packages\CSV\xJZKC\src\CSV.jl:645 [inlined]
 [10] (::getfield(CSV, Symbol("#kw##read")))(::NamedTuple{(:header, :limit),Tuple{Int64,Int64}}, ::typeof(CSV.read), ::String) at .\none:0
 [11] top-level scope at util.jl:156

@zjpi zjpi reopened this Jun 23, 2019
@quinnj
Copy link
Member

quinnj commented Jun 23, 2019

@xiaodaigh, what CSV.jl release are you on? 0.5.5? 0.5.6? Does it work on a specific version (0.5.4?) but not on a subsequent version? I'm trying to figure out if something actually regressed, or if the issue is just finicky in reproducing (which I suspect).

@xiaodaigh
Copy link
Contributor

Sorry noob error here. I am running 0.5.6 the latest. I will try older version to help you narrow down.

@xiaodaigh
Copy link
Contributor

or if the issue is just finicky in reproducing (which I suspect).

I can reproduce it every time on my desktop which is running Windows 10 Pro with a RAID0 2 SSD configuration.

@gabomgp
Copy link

gabomgp commented Aug 21, 2019

I'm supposed that it's possible to read a large compressed CSV using CSV.Rows. But it is failing with the same error:

SystemError: mmap: La operación se completó correctamente. 

So, is it possible to read a very large compressed CSV, iterating over the rows, in Julia?

Note: I'm in Windows.

@quinnj
Copy link
Member

quinnj commented Oct 8, 2019

I'm going to close this issue in favor of #432, so we can consolidate discussion around large file memory use. #510 is meant to improve memory efficiency in the general case, and allow for drastically better memory use when column types are provided explicitly by the user. I imagine there are still further improvements we can make in addition to what's in #510.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants