Install globally for the command line tools:
npm install -g datakit
Use --help
for each command to display help for it on the command line.
For example:
map --help
- batch
- concat
- distinct
- filter
- flatten
- format-table
- format-tree
- from-csv
- from-yaml
- group
- intersect
- length
- map
- omit
- orderBy
- pick
- reduce
- run
- skip
- take
- to-csv
- to-object
- to-yaml
- transform
Outputs the input dataset as an array of arrays, each sub-array containing the specified number of records.
batch <batch-size> [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- YAML file
- JSON formatted data on standard output.
- batch-size - Specifies the size for each batch.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
Reads JSON data from standard input, batches the records into groups of 5 and writes to standard output
command-that-produces-json | batch 5
batch 5 input-file.csv
batch 5 input-file.csv output-file.csv
command-that-produces-json | batch 5 - output-file.csv
Creates an output dataset by concatenating multiple input datasets. Works like array.concat
in JavaScript.
concat ...<input-file>
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON formatted data on standard output
- ...input-file - One or more input file names (json, csv or yaml) or a hypen to indicate reading JSON data from standard input.
- This command (unlike most others in Datakit) isn't able to output directly to a file. Use redirection to write a file, shown in the example above.
concat input-file1.json input-file2.json input-file3.json
Reads JSON data from standard input concatenates it with a file and writes the result to standard output
command-that-produces-json | concat - input-file.json
concat input-file1.json input-file2.json input-file3.json > output-file.json
Returns the set of distinct values from the input dataset. Removes duplicate values from the dataset.
distinct [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
command-that-produces-json | distinct -
distinct input-file.csv
distinct input-file.csv output-file.csv
Creates an output dataset by filtering the input dataset through the predicate function. Works just like array.filter
in JavaScript.
filter <predicate-fn> [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- predicate-fn - A JavaScript predicate function that is passed each record in the dataset and returns true/truthy to keep the record or false/falsy to rmeove the record. Specifying a file name will load the JavaScript code from the file.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
command-that-produces-json | filter "record => record.id === '1234'"
filter "record => record.id === '1234'" input-file.csv
filter "record => record.id === '1234'" input-file.csv output-file.csv
command-that-produces-json | filter "record => record.id === '1234'" - output-file.csv
filter --file my-filter.js input-file.csv output-file.csv
Flattens a nested dataset by 1 level. Works just like array.flat
in JavaScript with an argument of 1 or the flatten
function in Lodash.
flatten [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
- There is no reason to use this command with CSV data, because CSV data can't be nested. Use this command with JSON and YAML data.
command-that-produces-json | flatten
flatten input-file.json
flatten input-file.json output-file.json
command-that-produces-json | flatten - output-file.json
Formats data to table rendered in ascii.
format-table [<input-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- A table rendered in ascii output to standard output.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
command-that-produces-json | format-table -
format-table input-file.json
format-table input-file.yaml
Formats data to tree rendered in ascii.
format-tree [<input-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- A tree rendered in ascii output to standard output.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
command-that-produces-json | format-tree -
format-tree input-file.json
format-tree input-file.yaml
Converts data from the CSV (comma separated values) data format to the JSON data format.
from-csv [<csv-input-file>] [<output-file>]
Input can be 1 of the following:
- CSV file
- CSV formatted data on standard input
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- input-file - Can be an input file name (must be a CSV file) or a hypen to indicate reading CSV data from standard input.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
command-that-produces-csv | from-csv -
from-csv input-file.csv
from-csv input-file.csv output-file.yaml
from-csv input-file.csv output-file.json
Converts data from the YAML data format to the JSON data format.
from-yaml [<yaml-input-file>] [<output-file>]
Input can be 1 of the following:
- YAML file
- YAML formatted data on standard input
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- input-file - Can be an input file name (must be a YAML file) or a hypen to indicate reading YAML data from standard input.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
command-that-produces-yaml | from-yaml -
from-yaml input-file.yaml
from-yaml input-file.yaml output-file.csv
from-yaml input-file.yaml output-file.json
Organises records from an input dataset into groups based on a key.
group <key-selector-fn> [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- key-selector-fn - A JavaScript function to select the grouping key for each record of the input dataset. Specifying a file name will load the JavaScript code from the file.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
Reads JSON data from standard input, groups by "department" and writes the groups to standard output
command-that-produces-json | group "record => record.department"
group "record => record.department" input-file.csv
group input-file.csv "r => r.department" | map - "g => ({ department: g.key, totalSales: g.records.length })" output-file.csv
Aggregates two data sets with common keys kind of like an SQL join.
intersect <left-key-selector-fn> <left-input-file> <right-key-selector-fn> <right-input-file> <merge-fn> [<output-file>]
Input can be 2 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- left-key-selector-fn - A JavaScript function to select the join key for each record of the left dataset. Specifying a file name will load the JavaScript code from the file.
- left-input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- right-key-selector-fn - A JavaScript function to select the join key for each record of the right dataset. Specifying a file name will load the JavaScript code from the file.
- right-input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- merge-fn - A JavaScript function to merge records from left and right datasets. Specifying a file name will load the JavaScript code from the file.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
- You can only read input data from standard input from one of the left or right datasets.
Reads two JSON files and merges the datasets based on the "email" field, writes output to a JSON file
intersect "r => r.email" left-input.json "r => r.email" right-input.json "(left, right) => ({ ...left, ...right })" output.json
Gets the number of records in a dataset. Works just like array.length
in JavaScript.
length [<input-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- Prints the number of records in the input dataset.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input.
command-that-produces-json | length -
length input-file.csv
Creates an output dataset by calling the transformer function on every record of the input dataset. Works just like array.map
in JavaScript.
map <transformer-fn> [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- transformer-fn - A JavaScript function to transform each record of the input dataset. Specifying a file name will load the JavaScript code from the file.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
command-that-produces-json | map "record => record.x"
map "record => record.x" input-file.csv
map "record => record.x" input-file.csv output-file.csv
command-that-produces-json | map "record => record.x" - output-file.csv
map --file my-transformation.js input-file.csv output-file.csv
Creates a new dataset by omitting the specified fields from the input dataset.
omit <fields> [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- fields - Comma-separated list of field names to omit from the input data
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
- The input data can be an array of objects or a single object.
command-that-produces-json | omit ColumnA,ColumnB,ColumnC
omit ColumnA,ColumnB,ColumnC input-file.csv
omit ColumnA,ColumnB,ColumnC input-file.csv output-file.csv
command-that-produces-json | omit ColumnA,ColumnB,ColumnC - output-file.csv
Sorts the input dataset by the requested criteria and outputs the sorted dataset. Works a bit like array.sort
in JavaScript, but really it's way more advanced.
orderBy (<sort-fn> [<sort-direction>])+ [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- sort-fn - A JavaScript function to select the sort key from each record of the input dataset. Specifying a file name will load the JavaScript code from the file.
- sort-direction - Optional sort direction that may be "ascending" or "descending". Defaults to "ascending".
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
- The sort function and sort direction can be stacked up to create nested levels of sorting.
command-that-produces-json | orderBy "record => record.email"
orderBy "record => record.email" input-file.csv
orderBy "record => record.email" input-file.csv output-file.csv
orderBy --file my-sort-fn.js input-file.csv
Reads JSON data from standard input, sorts by name and then by age (a nested sort) and writes to standard output
orderBy "r => r.email" "r => r.age" - output-file.csv
Reads JSON data from standard input, sorts by age (oldest to youngest) and writes to standard output
orderBy "r => r.age" descending - output-file.csv
Creates a new dataset by picking the specified fields from the input dataset.
pick <fields> [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- fields - Comma-separated list of field names to pick from the input data
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
- The input data can be an array of objects or a single object.
command-that-produces-json | pick ColumnA,ColumnB,ColumnC
pick ColumnA,ColumnB,ColumnC input-file.csv
pick ColumnA,ColumnB,ColumnC input-file.csv output-file.csv
command-that-produces-json | pick ColumnA,ColumnB,ColumnC - output-file.csv
Reduces or aggregates an input dataset to some output value by repeatedly calling the reducer function on every record of the input. Works just like array.reduce
in JavaScript.
reduce <reducer-fn> <seed-value> [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- reducer-fn - A JavaScript "reducer" function called for each record of the input dataset. Specifying a file name will load the JavaScript code from the file.
- seed-value - JSON value that is used as the initial accumulator value for the reduction.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
Reads JSON data from standard input, applies the reduction to compute total sales and writes to standard output
command-that-produces-json | reduce "(a, r) => a + r.sales" 0
reduce "(a, r) => a + r.sales" 0 input-file.json
reduce "(a, r) => a + r.sales" 0 input-file.csv output-file.csv
command-that-produces-json | reduce "(a, r) => a + r.sales" 0 - output-file.csv
reduce input-file.yaml my-reducer.js 0
Execute a command for each record in the input dataset.
run <cmd-selector-fn> [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- cmd-selector-fn - A JavaScript function to transform each record of the input dataset that creates a command to execute.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
command-that-produces-json | run "record => `echo ${record.x}`"
run "record => `echo ${record.x}`" input-file.csv
Reads data from a file, creates a command and executes the command for each record, writing the output to another file.
run "record => `echo ${record.x}`" input-file.csv output-file.csv
Reads JSON data from standard input, creates a command and executes the command for each record, writing the output to a file.
command-that-produces-json | run "record => `echo ${record.x}`" - output-file.csv
run --file my-transformation.js input-file.csv output-file.csv
Skips the first X records of the input dataset and writes the remaining records to the output dataset.
skip <skip-number> [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- skip-number - The number of records to skip.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
Reads JSON data from standard input, skips 3 records and writes remaining records to standard output
command-that-produces-json | skip 3
skip 3 input-file.csv
skip 3 input-file.csv output-file.csv
command-that-produces-json | skip 3 - output-file.csv
Takes the first X records of the input dataset and writes them to the output dataset.
take <take-number> [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- take-number - The number of records to take.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
command-that-produces-json | take 3
take 3 input-file.csv
take 3 input-file.csv output-file.csv
command-that-produces-json | take 3 - output-file.csv
Converts data from the JSON data format to the CSV data format.
to-csv [options] [<input-file>] [<csv-output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- CSV file
- CSV formatted data on standard output
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- csv-output-file - The name of a file (must be a CSV file) to output the resulting dataset to. Omitting this causes CSV data to be written to standard output.
- --columns, --c=<column-names>
- Sets the columns (and their order) that will be included in the output CSV
- Example: --columns=ColumnA,ColumnB,ColumnC
command-that-produces-json | to-csv -
command-that-produces-json | to-csv output-file.csv
to-csv input-file.json output-file.csv
to-csv input-file.yaml output-file.csv
to-csv input-file.json output-file.csv --columns ColumnA,ColumnB,ColumnC
Creates a JSON object from key/value pairs extracted from the input dataset.
to-object <key-selector-fn> <value-selector-fn> [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- YAML file
- JSON formatted data on standard output.
- key-selector-fn - A JavaScript function to select the key from each record of the input dataset. Specifying a file name will load the JavaScript code from the file.
- value-selector-fn - A JavaScript function to select the value from each record of the input dataset. Specifying a file name will load the JavaScript code from the file.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
- Unlike many other datakit commands the to-object command cannot output to the CSV format.
command-that-produces-json | to-object "r => r.key" "r => r.value"
to-object "r => r.key" "r => r.value" input-file.csv
to-object "r => r.key" "r => r.value" input-file.csv output-file.json
to-object --file my-key-selector.js --file my-value-selector.js input-file.csv
to-object "r => r.key" "r => r.value" - output-file.json
Converts data from the JSON data format to the YAML data format.
to-yaml [<input-file>] [<yaml-output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- YAML file
- YAML formatted data on standard output
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- yaml-output-file - The name of a file (must be a YAML file) to output the resulting dataset to. Omitting this causes YAML data to be written to standard output.
command-that-produces-json | to-yaml
command-that-produces-json | to-yaml - output-file.yaml
to-yaml input-file.json output-file.yaml
to-yaml input-file.csv output-file.yaml
Transforms an entire dataset through a user defined function.
transform <transformer-fn> [<input-file>] [<output-file>]
Input can be 1 of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted array on standard input.
Output can be one of the following:
- JSON file
- CSV file
- YAML file
- JSON formatted data on standard output.
- transformer-fn - A JavaScript function to transform the input dataset. Specifying a file name will load the JavaScript code from the file.
- input-file - Can be an input file name (json, csv or yaml) or a hypen to indicate reading JSON data from standard input. Can be omitted if there are no further arguments.
- output-file - The name of a file (json, csv or yaml) to output the resulting dataset to. Omitting this causes JSON output to be written to standard output.
command-that-produces-json | transform "dataset => transform(dataset)"
transform "dataset => transform(dataset)" input-file.csv
transform "dataset => transform(dataset)" input-file.csv output-file.csv
command-that-produces-json | transform "dataset => transform(dataset)" - output-file.csv
transform --file my-transformation.js input-file.csv