Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor readtable speed #942

Closed
femtotrader opened this issue Apr 23, 2016 · 8 comments
Closed

Poor readtable speed #942

femtotrader opened this issue Apr 23, 2016 · 8 comments

Comments

@femtotrader
Copy link
Contributor

femtotrader commented Apr 23, 2016

Hello,

I try to read 1 month of tick data of AUD/USD

Sample data can be found here
https://drive.google.com/file/d/0B8iUtWjZOTqla3ZZTC1FS0pkZXc/view?usp=sharing
see also pydata/pandas-datareader#153

AUDUSD-2014-01.zip is a 11M file and contains AUDUSD-2014-01.csv which is a 85M file
which is not so big!

With Python / Pandas

$ ipython

In [1]: import pandas as pd

In [2]: %time df=pd.read_csv("AUDUSD-2014-01.csv", names=['Symbol', 'Date', 'Bid', 'Ask'])
CPU times: user 3.22 s, sys: 510 ms, total: 3.73 s
Wall time: 4.02 s

With Julia / DataFrames.jl / readtable

julia> @time df=readtable("AUDUSD-2014-01.csv");
 33.026234 seconds (42.83 M allocations: 1.591 GB, 61.37% gc time)

see also JuliaLang/julia#16015

Kind regards

PS: use

julia> @time df=readtable("AUDUSD-2014-01.csv", header = false, names=[:Symb, :Date, :Bid, :Ask]);

to have column name set correcly

use

julia> @time df[:Date] = DateTime(df[:Date], "yyyymmdd HH:MM:SS.s")

to convert to DateTime

@KristofferC
Copy link
Contributor

What happens if you turn off the garbage collector with gc_enable(false) or gc_disable() depending on julia version.

@femtotrader
Copy link
Contributor Author

it doesn't speedup loading.

@ViralBShah
Copy link
Contributor

Try readdlm, which seems to be equally fast as pandas. You will need to convert the results to a DataFrame.

@femtotrader
Copy link
Contributor Author

Sorry but I haven't found what should be imported to have this function available.

@nalimilan
Copy link
Member

readdlm is present in Julia Base, it's the same as (but more general than) readcsv. But you noted it was slow in the other issue.

@ViralBShah
Copy link
Contributor

I was able to load this entire dataset with readcsv in about 4 seconds on master.

@femtotrader
Copy link
Contributor Author

on my side

julia> using Base: readdlm
julia> @time dat=readdlm("AUDUSD-2014-01.csv", ',');
 76.044753 seconds (31.17 M allocations: 1.012 GB, 78.82% gc time)

julia> @time dat_no_sep=readdlm("AUDUSD-2014-01.csv");
 15.322299 seconds (15.58 M allocations: 590.372 MB, 60.08% gc time)

julia> @time dat_csv=readcsv("AUDUSD-2014-01.csv");
 16.317755 seconds (31.17 M allocations: 1.012 GB, 47.67% gc time)

@quinnj
Copy link
Member

quinnj commented Sep 7, 2017

readtable is now deprecated in favor of CSV.jl and TextParse.jl, which are much faster.

@quinnj quinnj closed this as completed Sep 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants