Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV.read() crashes while reading large file.(more than 1GB file.) #759

Closed
programmingknowledege opened this issue Oct 28, 2020 · 8 comments

Comments

@programmingknowledege
Copy link

No description provided.

@quinnj
Copy link
Member

quinnj commented Oct 28, 2020

In order to help, can you provide more details? What operating system? Version of julia? Version of CSV.jl? Is there a way you can share a sample of the file somehow? Otherwise, I don't have a way to really help here.

@programmingknowledege
Copy link
Author

In order to help, can you provide more details? What operating system? Version of julia? Version of CSV.jl? Is there a way you can share a sample of the file somehow? Otherwise, I don't have a way to really help here.

I am using Linux OS and was working on latest version of both Julia and CSV.jl package
The error which I encounter while reading CSV File was:
Error:Load OutofMemmory Error

@quinnj
Copy link
Member

quinnj commented Oct 28, 2020

Is there anyway you can share the file? Or a sample of the file? Or the column names + column types?

@programmingknowledege
Copy link
Author

Is there anyway you can share the file? Or a sample of the file? Or the column names + column types?

I have send you few rows from dataset along with their column names on which I am working:

,trustLevel,totalScanTimeInSeconds,grandTotal,lineItemVoids,scansWithoutRegistration,quantityModifications,scannedLineItemsPerSecond,valuePerSecond,lineItemVoidsPerPosition,fraud
0,5,1054,54.7,7,0,3,0.0275142314990512,0.0518975332068311,0.24137931034482799,0
1,3,108,27.36,5,2,4,0.12962962962963,0.253333333333333,0.3571428571428571,0
2,3,1516,62.16,3,10,5,0.00857519788918206,0.0410026385224274,0.23076923076923103,0
3,6,1791,92.31,8,4,4,0.0161920714684534,0.0515410385259632,0.27586206896551696,0
4,5,430,81.53,3,7,2,0.0627906976744186,0.18960465116279102,0.111111111111111,0
5,1,770,11.09,11,5,2,0.0337662337662338,0.0144025974025974,0.42307692307692296,1
6,3,294,55.63,2,7,1,0.0374149659863946,0.18921768707482997,0.18181818181818202,0
7,2,1545,22.8,0,8,4,0.00647249190938511,0.0147572815533981,0.0,0
8,6,962,65.44,7,0,2,0.0280665280665281,0.068024948024948,0.25925925925925897,0
9,2,725,41.08,10,2,4,0.0372413793103448,0.0566620689655172,0.37037037037037,0
10,5,1533,84.73,4,2,4,0.0104370515329419,0.0552707110241357,0.25,0
11,4,764,28.98,8,0,0,0.0261780104712042,0.0379319371727749,0.4,0
12,4,1736,5.46,4,10,4,0.00576036866359447,0.0031451612903225803,0.4,0
13,4,1705,16.96,7,4,4,0.0158357771260997,0.00994721407624633,0.25925925925925897,0
14,6,1659,52.53,2,1,4,0.0126582278481013,0.0316636528028933,0.0952380952380952,0
15,1,870,32.45,3,1,5,0.00689655172413793,0.0372988505747126,0.5,0
16,3,1295,42.9,11,10,0,0.016988416988417,0.0331274131274131,0.5,0
17,6,590,79.84,6,10,4,0.0406779661016949,0.135322033898305,0.25,0
18,3,139,59.0,8,5,5,0.0287769784172662,0.42446043165467606,2.0,0
19,3,1724,25.4,7,4,3,0.0110208816705336,0.0147331786542923,0.368421052631579,0
20,3,1691,59.61,1,2,2,0.00887049083382614,0.0352513305736251,0.0666666666666667,0
21,4,1505,2.7,6,4,0,0.00332225913621262,0.0017940199335548198,1.2,0
22,4,934,98.54,0,5,2,0.00856531049250535,0.10550321199143499,0.0,0
23,2,125,25.5,5,6,2,0.192,0.204,0.208333333333333,0
24,1,71,78.91,1,4,4,0.0140845070422535,1.1114084507042299,1.0,0
25,5,816,81.54,8,7,1,0.0110294117647059,0.0999264705882353,0.8888888888888891,0
26,3,1503,23.0,10,3,1,0.019294743845641997,0.0153027278775782,0.34482758620689696,0
27,4,329,56.65,10,2,5,0.0638297872340425,0.17218844984802403,0.47619047619047605,0
28,6,1629,40.42,6,7,4,0.0135052179251074,0.0248127685696746,0.272727272727273,0
29,5,1435,91.96,4,3,3,0.0132404181184669,0.0640836236933798,0.21052631578947398,0
30,6,314,48.54,6,1,1,0.0764331210191083,0.154585987261147,0.25,0
31,5,285,6.23,5,6,2,0.10526315789473699,0.021859649122806996,0.166666666666667,0
32,5,891,78.96,2,10,2,0.0280583613916947,0.0886195286195286,0.08,0
33,2,1355,48.87,2,0,4,0.00516605166051661,0.0360664206642066,0.28571428571428603,0
34,2,866,39.38,10,9,1,0.0046189376443418,0.045473441108545,2.5,0
35,2,335,14.4,0,9,5,0.0149253731343284,0.0429850746268657,0.0,0
36,5,834,80.64,0,10,3,0.0155875299760192,0.0966906474820144,0.0,0
37,2,1397,62.59,7,9,4,0.0178954903364352,0.0448031496062992,0.28,1
38,5,1561,91.75,7,6,2,0.0121716848174247,0.0587764253683536,0.368421052631579,0
39,4,847,22.88,4,5,2,0.0106257378984652,0.027012987012987003,0.444444444444444,0
40,1,1520,41.88,6,8,2,0.00131578947368421,0.0275526315789474,3.0,0
41,2,622,60.75,11,5,4,0.0128617363344051,0.0976688102893891,1.375,0
42,4,1012,59.9,2,3,2,0.0148221343873518,0.0591897233201581,0.133333333333333,0
43,2,1451,23.1,3,9,3,0.00137835975189524,0.0159200551343901,1.5,0
44,2,401,80.86,0,1,0,0.0199501246882793,0.201645885286783,0.0,0
45,6,960,93.09,7,2,3,0.0104166666666667,0.09696875,0.7,0
46,6,1433,72.19,7,7,0,0.0174459176552687,0.0503768318213538,0.28,0
47,6,1424,66.9,9,8,3,0.0175561797752809,0.0469803370786517,0.36,0
48,6,1820,54.64,0,0,1,0.007142857142857141,0.030021978021977997,0.0,0
49,1,16,32.29,2,0,5,1.5,2.018125,0.0833333333333333,0
50,4,617,90.73,0,0,5,0.00162074554294976,0.14705024311183099,0.0,0
51,3,1593,25.53,2,2,2,0.0175768989328311,0.0160263653483992,0.0714285714285714,0
52,1,608,85.05,3,5,2,0.0131578947368421,0.13988486842105302,0.375,0
53,1,1385,34.68,5,8,4,0.0194945848375451,0.0250397111913357,0.185185185185185,1
54,3,983,14.07,2,4,0,0.0193285859613428,0.0143133265513733,0.10526315789473699,0
55,1,768,7.24,4,10,3,0.0143229166666667,0.009427083333333329,0.36363636363636404,0
56,1,1474,60.64,9,0,5,0.0128900949796472,0.0411397557666214,0.47368421052631604,0
57,6,692,21.65,7,7,5,0.0216763005780347,0.0312861271676301,0.46666666666666706,0
58,2,351,50.75,1,7,0,0.0455840455840456,0.144586894586895,0.0625,0
59,1,107,39.55,1,6,4,0.25233644859813104,0.369626168224299,0.037037037037037,0
60,5,1237,48.22,7,2,3,0.00808407437348424,0.03898140662894101,0.7,0
61,2,1450,62.63,5,9,3,0.0103448275862069,0.0431931034482759,0.33333333333333304,0
62,5,508,57.9,2,7,0,0.0275590551181102,0.11397637795275599,0.14285714285714302,0
63,1,1748,26.31,0,9,1,0.0085812356979405,0.0150514874141876,0.0,0
64,6,1000,61.24,7,3,0,0.009000000000000001,0.061239999999999996,0.7777777777777779,0
65,1,1123,28.94,1,0,5,0.0133570792520036,0.0257702582368655,0.0666666666666667,0
66,1,232,18.27,9,5,2,0.00862068965517241,0.07875,4.5,0
67,2,1327,7.02,7,1,5,0.0188394875659382,0.00529012810851545,0.28,0
68,6,449,21.0,5,4,2,0.0022271714922049,0.0467706013363029,5.0,0
69,4,1228,3.17,7,4,4,0.0244299674267101,0.0025814332247557,0.23333333333333303,0
70,1,861,31.34,11,3,0,0.0232288037166086,0.0363995354239257,0.55,0
71,5,70,40.04,1,1,4,0.4,0.5720000000000001,0.0357142857142857,0
72,6,1154,57.64,7,10,4,0.0173310225303293,0.04994800693240901,0.35,0
73,3,1393,99.58,1,8,0,0.0136396267049533,0.0714860014357502,0.0526315789473684,0
74,2,1068,23.64,4,0,1,0.0131086142322097,0.0221348314606742,0.28571428571428603,0
75,6,300,93.04,4,4,3,0.0333333333333333,0.31013333333333304,0.4,0
76,2,250,33.95,11,10,3,0.055999999999999994,0.1358,0.7857142857142858,0
77,4,1411,49.25,2,8,4,0.00283486888731396,0.0349043231750532,0.5,0
78,4,1279,89.11,0,0,3,0.0109460516028147,0.0696716184519156,0.0,0
79,3,1771,2.51,3,1,4,0.00508187464709204,0.00141727837380011,0.33333333333333304,0
80,3,1642,6.25,7,8,2,0.0048721071863581,0.0038063337393422704,0.875,0
81,3,1018,27.26,1,9,0,0.0235756385068762,0.0267779960707269,0.0416666666666667,0
82,3,44,78.64,1,10,1,0.0909090909090909,1.7872727272727298,0.25,0
83,3,550,84.57,6,0,3,0.04,0.153763636363636,0.272727272727273,0
84,2,109,24.32,10,0,5,0.100917431192661,0.22311926605504603,0.9090909090909091,0
85,3,1660,12.58,6,5,1,0.00903614457831325,0.00757831325301205,0.4,0
86,2,1078,58.61,9,10,3,0.007421150278293141,0.0543692022263451,1.125,0
87,2,1464,58.21,8,3,0,0.0177595628415301,0.0397609289617486,0.307692307692308,0
88,5,1228,85.93,11,0,1,0.0162866449511401,0.0699755700325733,0.55,0
89,6,1790,70.66,5,1,0,0.00726256983240223,0.0394748603351955,0.384615384615385,0
90,5,1052,67.09,6,2,4,0.0180608365019011,0.0637737642585551,0.315789473684211,0
91,1,996,71.94,7,0,1,0.0271084337349398,0.0722289156626506,0.25925925925925897,1
92,2,1515,7.69,11,4,1,0.0138613861386139,0.00507590759075908,0.523809523809524,0
93,6,1245,22.06,4,7,4,0.00803212851405622,0.017718875502008,0.4,0
94,2,952,0.58,8,5,0,0.0168067226890756,0.000609243697478992,0.5,0
95,5,1570,47.79,4,10,5,0.00127388535031847,0.0304394904458599,2.0,0
96,3,1599,36.79,9,5,3,0.0100062539086929,0.0230081300813008,0.5625,0
97,2,1305,87.65,7,8,4,0.021455938697318003,0.0671647509578544,0.25,1
98,6,122,14.32,1,5,5,0.114754098360656,0.11737704918032801,0.0714285714285714,0
99,6,861,97.36,6,9,5,0.009291521486643441,0.11307781649245098,0.75,0
100,6,1577,78.41,4,6,1,0.0145846544071021,0.04972098922003799,0.17391304347826098,0
101,4,597,45.98,0,4,4,0.0284757118927973,0.0770184254606365,0.0,0
102,1,1319,37.1,7,7,0,0.0144048521607278,0.0281273692191054,0.368421052631579,0
103,1,1335,38.95,5,1,3,0.0142322097378277,0.0291760299625468,0.263157894736842,0
104,6,894,77.5,6,6,4,0.0111856823266219,0.0866890380313199,0.6,0
105,5,533,0.28,6,0,2,0.00375234521575985,0.000525328330206379,3.0,0
106,3,1636,70.4,8,9,4,0.00916870415647922,0.0430317848410758,0.5333333333333329,0
107,3,185,98.96,4,2,1,0.14594594594594598,0.5349189189189191,0.148148148148148,0
108,4,113,23.38,11,10,0,0.15929203539823,0.20690265486725698,0.6111111111111109,0
109,4,1814,52.98,3,7,4,0.0071664829106946,0.0292061742006615,0.23076923076923103,0
110,4,477,1.36,10,1,1,0.0440251572327044,0.0028511530398322897,0.47619047619047605,0
111,6,1123,33.03,10,6,3,0.0053428317008014205,0.0294122885129118,1.6666666666666698,0
112,2,1278,98.78,2,3,1,0.00625978090766823,0.0772926447574335,0.25,0
113,3,1171,90.98,8,1,0,0.013663535439795,0.0776942783945346,0.5,0

@quinnj
Copy link
Member

quinnj commented Oct 29, 2020

@programmingknowledege , can you share the stacktrace you're seeing of the crash? It can be really helpful to try and pinpoint which line of code is causing the crash.

@programmingknowledege
Copy link
Author

programmingknowledege commented Oct 30, 2020

@programmingknowledege , can you share the stacktrace you're seeing of the crash? It can be really helpful to try and pinpoint which line of code is causing the crash.

Code:
using DataFrames
using CSV
data=CSV.File(open(read,"write.csv")) |> DataFrame

ERROR: LoadError: OutOfMemoryError()
Stacktrace:
[1] copy(::Array{Float64,1}) at ./array.jl:299
[2] (::getfield(DataFrames, Symbol("##DataFrame#127#128")))(::Bool, ::Type, ::Array{AbstractArray{T,1} where T,1}, ::DataFrames.Index) at /home/kunalbafna/.julia/packages/DataFrames/GtZ1l/src/dataframe/dataframe.jl:148
[3] Type at ./none:0 [inlined]
[4] #fromcolumns#575(::Bool, ::Function, ::CSV.File{false}, ::Array{Symbol,1}) at /home/kunalbafna/.julia/packages/DataFrames/GtZ1l/src/other/tables.jl:28
[5] #DataFrame#576 at ./none:0 [inlined]
[6] DataFrame(::CSV.File{false}) at /home/kunalbafna/.julia/packages/DataFrames/GtZ1l/src/other/tables.jl:34
[7] |>(::CSV.File{false}, ::Type) at ./operators.jl:813
[8] top-level scope at none:0
[9] include at ./boot.jl:317 [inlined]
[10] include_relative(::Module, ::String) at ./loading.jl:1044
[11] include(::Module, ::String) at ./sysimg.jl:29
[12] exec_options(::Base.JLOptions) at ./client.jl:266
[13] _start() at ./client.jl:425
in expression starting at /home/kunalbafna/Downloads/compare.jl:3

@quinnj
Copy link
Member

quinnj commented Oct 30, 2020

Ok, a couple of thoughts:

  • Looks like it's not really a crash? Just an OutOfMemoryError, which is descriptive: looks like your system doesn't have enough memory to copy the array there
  • If you use CSV.read("write.csv", DataFrame), things will be optimized in a couple of ways: CSV.jl will mmap the file instead of needing to call read on the file, it's a subtle difference, but can affect overall memory pressure; the other benefit is you avoid making copies of the columns, since it looks like you'd like to use the CSV.File columns in the DataFrame directly anyway

@programmingknowledege
Copy link
Author

Ok, a couple of thoughts:

  • Looks like it's not really a crash? Just an OutOfMemoryError, which is descriptive: looks like your system doesn't have enough memory to copy the array there
  • If you use CSV.read("write.csv", DataFrame), things will be optimized in a couple of ways: CSV.jl will mmap the file instead of needing to call read on the file, it's a subtle difference, but can affect overall memory pressure; the other benefit is you avoid making copies of the columns, since it looks like you'd like to use the CSV.File columns in the DataFrame directly anyway

Thankyou so much Sir!Well Explained!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants