Skip to content

X Experimental Benchmarks

pkoppstein edited this page Jul 15, 2021 · 24 revisions

Experimental Benchmarks Page

This page describes some benchmarks and gives representative timings and "maxrss" (maximum resident set size) statistics.

Each "test" consists of a combination of a task, often given as a jq program, and some input data (possibly null). The first test however involves the md5 program, first so that the md5 value of a particular JSON file can be shown, and to give a reference point for comparison.

Each combination of task and input data is assigned a number, given in the form (N); for example, the first "test" is:

(1) md5 jeopardy.json

This page is organized as follows:

  • the SOURCES sections has one subsection each for DATA and for PROGRAMS;

  • the RESULTS section is organized into GROUPS so that the timings within each group are roughly comparable. Groups are identified by a string such as "Mac OS X (High Sierra) 3GHz 16GB RAM"

In the RESULTS section, the version of jq should be specified according to its tag, e.g. jq-1.5, jq-1.6rc1

SOURCES

SOURCES: DATA

"jeopardy.json" (aka JEOPARDY_QUESTIONS1.json) [54MB]

Description: https://www.reddit.com/r/datasets/comments/1uyd0t/200000_jeopardy_questions_in_a_json_file

"citylots.json" [181MB]

Description: https://github.com/zemirco/sf-city-lots-json

SOURCES: PROGRAMS

"schema.jq"

Note: in the tests, the last line of "schema.jq" has been uncommented, but see footnote [*1] below for alternatives.

"testzip.jq"

def zip(headers):
  . headers as $headers
  | [$headers, .] | transpose | map({(.[0]): .[1]}) | add ;

def testzip(n):
  [range(0;n)] as $row
  | $row | zip( $row|map(tostring) ) ;

testzip(1000000) | length

RESULTS

GROUP: "Mac OS X (High Sierra) 3GHz 16GB RAM"

(1) md5 jeopardy.json

MD5 (jeopardy.json) = 2075398fa049b1c00223b2279ca5281d
user	0m0.126s
sys	0m0.025s
maxrss  11341824

(2) length jeopardy.json

jq-1.5 length jeopardy.json
216930
user	0m1.144s
sys	0m0.112s
maxrss  223440896

(2 rq) length jeopardy.json

rq 'map(s)=>{s.length}' < jeopardy.json
216930
user	4.76s
sys	0.27s
maxrss  372486144 

(2 gojq) length jeopardy.json

216930
user    1.04
sys     0.16
maxrss  235171840

(3) schema.jq jeopardy.json

jq-1.5 -f schema.jq jeopardy.json > jeopardy.schema.json
user 7.10s
sys  0.13s
maxrss 223457280

jq-1.6 -f schema.jq jeopardy.json > jeopardy.schema.json

user        10.87
sys          0.14
maxrss  223322112

gojq -f schema.jq jeopardy.json > jeopardy.schema.json

user        25.92
sys          2.32
maxrss 3643469824 

(4) null testzip.jq

jq-1.5 -n testzip.jq
1000000
user 6.11s
sys  0.35s
maxrss 711286784

(5) . jeopardy.json

jq-1.5 . jeopardy.json | wc -l
1952372
user        4.69s
sys         0.12s
maxrss 223350784

(5 rq) . jeopardy.json

rq --format readable id < jeopardy.json | wc -l
1952372

user   21.38s
sys     2.13s
maxrss 381214720

(6) 'select(length==2)' jeopardy.json # --stream

jq-1.5 --stream 'select(length==2)' jeopardy.json | wc -l
10629570
user	0m8.901s
sys	0m0.087s
maxrss 1359872

(7) null 0

jq-1.5 -n 0
user   0.002924s
sys    0.001339s
maxrss 1187840

Times are based on 1000 iterations using a bash loop, after adjusting for the times of the looping itself.

jq-1.6rc1 -n 0
user:  0.030609s
sys :  0.001838s
maxrss 2076672

Times are based on 1000 iterations using a bash loop, after adjusting for the times of the looping itself.

(8) md5 citylots.json

md5 citylots.json
MD5 (citylots.json) = 158346af5a90253d8b4390bd671eb5c5
user 0.43s
sys  0.06s 
maxrss  11333632

(9) length citylots.json

jq-1.5 length citylots.json
2
user	0m6.887s
sys	0m0.772s
maxrss 1375858688

(10) '.features|length' citylots.json

jq-1.5 '.features|length' citylots.json
206560
user 6.23s
sys  0.78s 
maxrss 1375899648

(11) schema.jq citylots.json

jq-1.5 -f schema.jq citylots.json > citylots.schema.json
user       67.05s
sys         1.10s
maxrss 1375961088

(12) .features[10000].properties.LOT_NUM citylots.json

jq-1.5 '.features[10000].properties.LOT_NUM' citylots.json
"091"
user   6.44s
sys    0.97s
maxrss 1371561984
jq-1.6rc1 '.features[10000].properties.LOT_NUM' citylots.json
"091"
user   5.46
sys    0.73 
maxrss 1375936512
jq-1.5 -n --stream 'first(inputs | select(.[0] == ["features",10000,"properties","LOT_NUM"])) | .[1]' citylots.json
"091"
user   0.60s
sys    0.00s
maxrss 2084864 

APPENDIX 1: Output

"jeopardy.schema.json"

{
  "air_date": "string",
  "answer": "string",
  "category": "string",
  "question": "string",
  "round": "string",
  "show_number": "string",
  "value": "string"
}

FOOTNOTES

[*1] If you prefer to use schema.jq as it exists on the web, here are two alternative methods that can be considered:

(a) jq -f <(cat schema.jq; echo schema) ... (b) jq 'include "schema"; schema' ...

For further details about using include, see the jq documentation.

Clone this wiki locally