-
-
Notifications
You must be signed in to change notification settings - Fork 47
jsonl Parser
(Since 1.6.0) This is a convenience component for parsing large JSONL (AKA NDJSON) files. It is a Transform stream, which consumes text and produces a stream of JavaScript objects. It is always the first in a pipe chain being directly fed with text from a file, a socket, the standard input, or any other text stream.
Its Writable
part operates in a buffer/text mode, while its Readable
part operates in an objectMode.
Functionally, json/Parser
replaces a combination of Parser with jsonStreaming
set to true
, which immediately follows by StreamValues. The only reason for its existence is improved performance.
Just like StreamValues
it produces a stream of objects like that:
StreamValues
assumes that a token stream represents subsequent values and streams them out one by one.
// From JSONL:
// 1
// "a"
// []
// {}
// true
// It produces:
{key: 0, value: 1}
{key: 1, value: 'a'}
{key: 2, value: []}
{key: 3, value: {}}
{key: 4, value: true}
The simple example (streaming from a file):
const {parser: jsonlParser} = require('stream-json/jsonl/Parser');
const fs = require('fs');
const pipeline = fs.createReadStream('sample.json').pipe(jsonlParser());
let objectCounter = 0;
pipeline.on('data', data => data.name === 'startObject' && ++objectCounter);
pipeline.on('end', () => console.log(`Found ${objectCounter} objects.`));
The alternative example:
const JsonlParser = require('stream-json/jsonl/Parser');
const fs = require('fs');
const jsonlParser = new JsonlParser();
const pipeline = fs.createReadStream('sample.json').pipe(jsonlParser);
let objectCounter = 0;
pipeline.on('data', data => data.name === 'startObject' && ++objectCounter);
pipeline.on('end', () => console.log(`Found ${objectCounter} objects.`));
Both of them are functionally equivalent to:
const {parser} = require('stream-json/Parser');
const {streamValues} = require('stream-json/streamers/StreamValues');
const fs = require('fs');
const pipeline = fs.createReadStream('sample.json')
.pipe(parser({jsonStreaming: true})
.pipe(streamValues());
let objectCounter = 0;
pipeline.on('data', data => data.name === 'startObject' && ++objectCounter);
pipeline.on('end', () => console.log(`Found ${objectCounter} objects.`));
The module returns the constructor of jsonl/Parser
. Being a stream jsonl/Parser
doesn't have any special interfaces. The only thing required is to configure it during construction.
In many real cases, while files are huge, individual data items can fit in memory. It is better to work with them as a whole, so they can be inspected. jsonl/Parser
leverages JSONL format and returns a stream of JavaScript objects exactly like StreamValues.
options
is an optional object described in detail in node.js' Stream documentation. Additionally, the following custom flags are recognized:
-
reviver
is an optional function, which takes two arguments and returns a value.- See JSON.parse() for more details.
-
(Since 1.7.2)
checkErrors
is an optional boolean value. If it is truthy, every call toJSON.parse()
is checked for an exception, which is passed to a callback. Otherwise,JSON.parse()
errors are ignored for performance reasons. Default:false
. -
(Since 1.8.0)
errorIndicator
is an optional value. If it is specified it supersedescheckError
. When it is present, every call toJSON.parse()
is checked for an exception and processed like that:- If
errorIndicator
isundefined
the error is completely suppressed. No value is produced and the globalkey
is not advanced. - If
errorIndicator
is a function, it is called with an error object. Its result is used this way:- If it is
undefined
⇒ skip as above. - Any other value is returned as a
value
.
- If it is
- Any other value of
errorIndicator
is returned as avalue
.
Default: none.
- If
make()
and parser()
are two aliases of the factory function. It takes options
described above, and returns a new instance of jsonl/Parser
. parser()
helps to reduce a boilerplate when creating data processing pipelines:
const {chain} = require('stream-chain');
const {parser} = require('stream-json/jsonl/Parser');
const fs = require('fs');
const pipeline = chain([
fs.createReadStream('sample.json'),
parser()
]);
let objectCounter = 0;
pipeline.on('data', data => data.name === 'startObject' && ++objectCounter);
pipeline.on('end', () => console.log(`Found ${objectCounter} objects.`));
Constructor
property of make()
(and parser()
) is set to jsonl/Parser
. It can be used for the indirect creation of parsers or metaprogramming if needed.