-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parseJson is extremely slow #12152
Comments
Did you try packed json? |
@FedericoCeratto use |
Update: running a benchmark with only one JSON object (one line of the file), using -d:danger and Nim devel, picking the best parseJson time out of 10000 loops:
|
|
I wouldn't say |
@disruptek Only the second chunk of the last test is done against ujson, everything else was against Python's |
Can we see your Python? I thought maybe there was some cheating going on whereby data wasn't being unpacked until accessed. I made this native Python to test that theory; it's about twice as slow as Nim: import json
fh = open("j")
for line in fh.readlines():
x = json.loads(line)
if x["test_version"] == "nah":
break |
@krux02 I did, with -d:release it fails with a RangeError, with -d:danger it takes ~20% longer |
@disruptek the Python you wrote takes 4.6s vs 11s on Nim devel with -d:danger |
@dom96 in terms of performance this options are now equal. Difference between
All of this options cannot seriously affect performance.
But this list of options is options which can kill all the performance. |
A Nim snippet to pre-load and split the file in lines and time only the parseJson:
|
one performance advantage python (and D) has over nim is that slicing strings/seq can be done both safely and without allocating (see https://dlang.org/articles/d-array-article.html). This obviously has serious performance implications in some applications. |
No it doesn't for the simple reason that Python doesn't have O(1) slicing. (It could do it, but it doesn't. At least up to version 3.3 or something, can't check every Python version.) |
for import numpy as np
a=np.array([1,2,3])
b=a[0:2]
a[0]=10
b # array([10, 2]) for a=bytearray(b"hello world")
a2 = memoryview(a)
a2[0]=105
a # bytearray(b'iello world') as for D, strings are just special case of dynamic arrays ( |
It is planned to extend openArray[T] for this indeed. |
that's really good to hear. Is there an issue to track that besides nim-lang/RFCs#88 ? (ideally a PR to an RFC markdown file so it can be edited by anyone, like nim-lang/RFCs#167) |
@timotheecour I tested the benchmarks with discard and with tricks to prevent optimizing away the JSON parsing and it was the same. |
I've reproduced json timing results from https://embark.status.im/news/2019/11/18/nim-vs-crystal-part-1-performance-interoperability/index.html on json file; nim is indeed slower than crystal by a factor 1.3X to 1.5X; I've tried packedjson as recommended here: #12152 (comment) and nim is now 1.08X faster than crystal: import pkg/packedjson
let jobj = parseFile("/tmp/1.json")
let coordinates = jobj["coordinates"]
let len = float(coordinates.len)
var x = 0.0
var y = 0.0
var z = 0.0
for coord in coordinates:
x += coord["x"].getFloat
y += coord["y"].getFloat
z += coord["z"].getFloat
echo x / len
echo y / len
echo z / len so maybe std/json could be improved and this issue re-opened? (without the 'extremely' part...) note that parseJsonFragments is not used in https://embark.status.im/news/2019/11/18/nim-vs-crystal-part-1-performance-interoperability/index.html |
That particular speed issue was solved though and packedjson gets its speed by breaking the API. |
JSON parsing using the stdlib is still quite slow and people use it for comparative benchmarks across languages. IMO we should warn users in the stdlib documentation and keep an open issue. |
Related: #3809 |
parseJson on Nim 0.20.2 on amd64 is 5x slower than Python (!)
Steps to reproduce:
The test file contains 2854 lines and takes 20s for Nim and 4s for Python 3
The text was updated successfully, but these errors were encountered: