parseJson is extremely slow #12152

FedericoCeratto · 2019-09-06T20:39:45Z

parseJson on Nim 0.20.2 on amd64 is 5x slower than Python (!)
Steps to reproduce:

import json, times
let t = epochTime()
for l in lines("j"):
  discard parseJson(l)
echo epochTime() - t

curl -s https://ooni-data.s3.amazonaws.com/autoclaved/jsonl.tar.lz4/2019-07-20/20190720T180822Z-BR-AS28573-web_connectivity-20190720T180824Z_AS28573_yoUArbw59XNf9gyshmMwTJa6oeODxoRrp7ara6VhTBog0l5Izf-0.2.0-probe.json.lz4 | lz4 -dcfm > j
nim c -d:release --hints:off -r bench.nim

The test file contains 2854 lines and takes 20s for Nim and 4s for Python 3

The text was updated successfully, but these errors were encountered:

krux02 · 2019-09-06T21:27:46Z

Did you try packed json?

cheatfate · 2019-09-06T21:36:52Z

@FedericoCeratto use -d:danger not -d:release, because -d:release is now equal to -d:debug.

FedericoCeratto · 2019-09-07T10:02:06Z

Update: running a benchmark with only one JSON object (one line of the file), using -d:danger and Nim devel, picking the best parseJson time out of 10000 loops:

line #    len    Python time    Nim time
1     2717    40.7 usec      131 us
2   138561    1.12 msec     3431 us
3   497396    4.39 msec    11519 us
4    62906     834 usec     1474 us

Same against Python's ujson:
len    Python time    Nim time
2717	 27.6 usec	132 us	
138561  547 usec	3870 us	
497396  2.12 msec	11414 us	
62906	 320 usec	1477 us	
2869	 26.3 usec	130 us

dom96 · 2019-09-07T11:02:10Z

@FedericoCeratto use -d:danger not -d:release, because -d:release is now equal to -d:debug.

-d:release is not equal to -d:debug: https://github.com/nim-lang/Nim/blob/devel/config/nim.cfg#L72-L79

disruptek · 2019-09-07T14:41:08Z

I wouldn't say json is slow, but rather, that ujson is fast. Also, calling it Python isn't really fair; there's very little actual Python running in a simple repro. Try running Python's json library instead and see how it does. 😉

FedericoCeratto · 2019-09-07T14:45:55Z

@disruptek Only the second chunk of the last test is done against ujson, everything else was against Python's json.

disruptek · 2019-09-07T16:14:02Z

Can we see your Python? I thought maybe there was some cheating going on whereby data wasn't being unpacked until accessed. I made this native Python to test that theory; it's about twice as slow as Nim:

import json

fh = open("j")
for line in fh.readlines():
    x = json.loads(line)
    if x["test_version"] == "nah":
        break

Clyybber · 2019-09-07T16:17:38Z

Did you try packed json?

@krux02 I did, with -d:release it fails with a RangeError, with -d:danger it takes ~20% longer

FedericoCeratto · 2019-09-07T17:26:43Z

@disruptek the Python you wrote takes 4.6s vs 11s on Nim devel with -d:danger
Replacing json with ujson in Python goes down to 2.6s

cheatfate · 2019-09-08T07:28:10Z

@dom96 in terms of performance this options are now equal.

Difference between release and debug is

stacktrace:off
excessiveStackTrace:off
linetrace:off
debugger:off
line_dir:off
opt:speed

All of this options cannot seriously affect performance.

obj_checks:off
field_checks:off
range_checks:off
bound_checks:off
overflow_checks:off
assertions:off
@if nimHasNilChecks:
  nilchecks:off
@end

But this list of options is options which can kill all the performance.

FedericoCeratto · 2019-09-08T17:30:29Z

A Nim snippet to pre-load and split the file in lines and time only the parseJson:

import strutils, json, times
let mylines = readFile("j").strip().splitLines()
echo mylines.len
let t = epochTime()
for l in mylines:
  discard parseJson(l)
echo epochTime() - t

timotheecour · 2019-09-09T05:23:07Z

pre-load and split the file in lines and time only the parseJson

benchmarking code like for l in mylines: needs to be done with caution, depending on how large mylines is, the main bottleneck can become page faults. Ideally the data used by that benchmarks fits in the cache
also it may be a good idea to ensure code doesn't get optimized away (both in python and nim), with:

var count=0 # dummy counter
for l in mylines:
  count += parseJson(l).len # or whatever is needed to avoid optimizing away
doAssert count != 0

I'm curious whether this could be related to https://forum.nim-lang.org/t/5103 Nim vs. Python & Groovy (string splitting): Why is string splitting so slow in Nim?

one performance advantage python (and D) has over nim is that slicing strings/seq can be done both safely and without allocating (see https://dlang.org/articles/d-array-article.html). This obviously has serious performance implications in some applications.
At least supporting unsafe (first-class) slices would help for performance critical applications. See also nim-lang/RFCs#88

Araq · 2019-09-09T08:37:20Z

This obviously has serious performance implications in some applications.

No it doesn't for the simple reason that Python doesn't have O(1) slicing. (It could do it, but it doesn't. At least up to version 3.3 or something, can't check every Python version.)

timotheecour · 2019-09-09T16:36:52Z

for seq O(1) slicing is the raison-d'etre of numpy in python:

import numpy as np
a=np.array([1,2,3])
b=a[0:2]
a[0]=10
b # array([10,  2])

for string, admittedly, it's more awkward (and probably not used much for user facing code), but still doable:

a=bytearray(b"hello world")
a2 = memoryview(a)
a2[0]=105
a # bytearray(b'iello world')

as for D, strings are just special case of dynamic arrays (immutable(char)[]) and also has safe O(1) slicing (char[] can be used for mutable slices). nim doesn't have immutability but unsafe slicing would be a good compromise. It would need to be another type than seq[T] / string obviously, but it could potentially be openArray[T] under nim-lang/RFCs#88

Araq · 2019-09-09T18:31:56Z

It is planned to extend openArray[T] for this indeed.

timotheecour · 2019-09-09T19:32:25Z

that's really good to hear. Is there an issue to track that besides nim-lang/RFCs#88 ? (ideally a PR to an RFC markdown file so it can be edited by anyone, like nim-lang/RFCs#167)

FedericoCeratto · 2019-09-09T22:48:07Z

@timotheecour I tested the benchmarks with discard and with tricks to prevent optimizing away the JSON parsing and it was the same.

FedericoCeratto · 2019-12-26T17:38:07Z

Related: https://embark.status.im/news/2019/11/18/nim-vs-crystal-part-1-performance-interoperability/index.html

timotheecour · 2019-12-27T01:14:53Z

I've reproduced json timing results from https://embark.status.im/news/2019/11/18/nim-vs-crystal-part-1-performance-interoperability/index.html on json file; nim is indeed slower than crystal by a factor 1.3X to 1.5X; parseJson is the slow part

I've tried packedjson as recommended here: #12152 (comment) and nim is now 1.08X faster than crystal:

  import pkg/packedjson
  let jobj = parseFile("/tmp/1.json")
  let coordinates = jobj["coordinates"]
  let len = float(coordinates.len)
  var x = 0.0
  var y = 0.0
  var z = 0.0

  for coord in coordinates:
    x += coord["x"].getFloat
    y += coord["y"].getFloat
    z += coord["z"].getFloat

  echo x / len
  echo y / len
  echo z / len

so maybe std/json could be improved and this issue re-opened? (without the 'extremely' part...)

note that parseJsonFragments is not used in https://embark.status.im/news/2019/11/18/nim-vs-crystal-part-1-performance-interoperability/index.html

Araq · 2019-12-30T11:52:55Z

That particular speed issue was solved though and packedjson gets its speed by breaking the API.

FedericoCeratto · 2019-12-30T12:34:02Z

JSON parsing using the stdlib is still quite slow and people use it for comparative benchmarks across languages. IMO we should warn users in the stdlib documentation and keep an open issue.

FedericoCeratto · 2021-02-12T19:36:22Z

Related: #3809

andreaferretti added Performance Standard Library labels Sep 9, 2019

Araq added a commit that referenced this issue Sep 10, 2019

fixes #12152

2f10f5f

Araq closed this as completed in e134a72 Sep 11, 2019

disruptek mentioned this issue Oct 3, 2019

nim.cfg danger/release flags honored inconsistently #12349

Open

timotheecour mentioned this issue Jan 31, 2020

[performance] performance tricks to improve std/json, parsers, lookups nim-lang/RFCs#188

Open

This was referenced Jun 3, 2021

json decode is slower than go! #12833

Closed

Speed up parsejson 3.25x (with a gcc-10.2 PGO build) on number heavy input #16055

Closed

hamidb80 mentioned this issue Dec 30, 2021

sharing contact info causes error from sam module ba0f3/telebot.nim#68

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parseJson is extremely slow #12152

parseJson is extremely slow #12152

FedericoCeratto commented Sep 6, 2019 •

edited

Loading

krux02 commented Sep 6, 2019 •

edited

Loading

cheatfate commented Sep 6, 2019

FedericoCeratto commented Sep 7, 2019 •

edited

Loading

dom96 commented Sep 7, 2019

disruptek commented Sep 7, 2019

FedericoCeratto commented Sep 7, 2019

disruptek commented Sep 7, 2019

Clyybber commented Sep 7, 2019 •

edited

Loading

FedericoCeratto commented Sep 7, 2019

cheatfate commented Sep 8, 2019

FedericoCeratto commented Sep 8, 2019 •

edited

Loading

timotheecour commented Sep 9, 2019 •

edited

Loading

Araq commented Sep 9, 2019 •

edited

Loading

timotheecour commented Sep 9, 2019 •

edited

Loading

Araq commented Sep 9, 2019

timotheecour commented Sep 9, 2019 •

edited

Loading

FedericoCeratto commented Sep 9, 2019

FedericoCeratto commented Dec 26, 2019

timotheecour commented Dec 27, 2019 •

edited

Loading

Araq commented Dec 30, 2019

FedericoCeratto commented Dec 30, 2019

FedericoCeratto commented Feb 12, 2021

parseJson is extremely slow #12152

parseJson is extremely slow #12152

Comments

FedericoCeratto commented Sep 6, 2019 • edited Loading

krux02 commented Sep 6, 2019 • edited Loading

cheatfate commented Sep 6, 2019

FedericoCeratto commented Sep 7, 2019 • edited Loading

dom96 commented Sep 7, 2019

disruptek commented Sep 7, 2019

FedericoCeratto commented Sep 7, 2019

disruptek commented Sep 7, 2019

Clyybber commented Sep 7, 2019 • edited Loading

FedericoCeratto commented Sep 7, 2019

cheatfate commented Sep 8, 2019

FedericoCeratto commented Sep 8, 2019 • edited Loading

timotheecour commented Sep 9, 2019 • edited Loading

Araq commented Sep 9, 2019 • edited Loading

timotheecour commented Sep 9, 2019 • edited Loading

Araq commented Sep 9, 2019

timotheecour commented Sep 9, 2019 • edited Loading

FedericoCeratto commented Sep 9, 2019

FedericoCeratto commented Dec 26, 2019

timotheecour commented Dec 27, 2019 • edited Loading

Araq commented Dec 30, 2019

FedericoCeratto commented Dec 30, 2019

FedericoCeratto commented Feb 12, 2021

FedericoCeratto commented Sep 6, 2019 •

edited

Loading

krux02 commented Sep 6, 2019 •

edited

Loading

FedericoCeratto commented Sep 7, 2019 •

edited

Loading

Clyybber commented Sep 7, 2019 •

edited

Loading

FedericoCeratto commented Sep 8, 2019 •

edited

Loading

timotheecour commented Sep 9, 2019 •

edited

Loading

Araq commented Sep 9, 2019 •

edited

Loading

timotheecour commented Sep 9, 2019 •

edited

Loading

timotheecour commented Sep 9, 2019 •

edited

Loading

timotheecour commented Dec 27, 2019 •

edited

Loading