Test data generator for generating dummy data for data quality engineering projects.
From the terminal,
- Clone this repo
cd test-data-generator
npm install
npm -s start
(the-s
prevents annoying warnings from being output)- Answer the questions
- Test data will be in the
output/
directory
Here are instructions for using the generator to create test data that conforms to your own specifications.
The generator generates TWO tables in .csv
format: a SOURCE
and a TARGET
table. You have control over:
- The number of records generated in the SOURCE file
- Whether or not to include optional columns?
- The modifications made to the TARGET dataset:
- How many greater/fewer rows in
TARGET
vs.SOURCE
- Should column order be randomized?
- Should column names use similar but different formats between the tables?
- Should small amounts be added to floats in some columns?
- Should dates be modified to include some different or invalid values?
- Should lat/lon values be modified to include some invalid values?
- How many greater/fewer rows in
The structure of the generated tables is determined by a file called colspec.json
that is stored in the root directory of the project. (There's another colspec.json
in the test/
directory, but ignore that one.) This file is an array of JSON objects, where each object represents one column in the generated tables. Here's what a column specification looks like:
[
{
"name": "Txn Amount", // default name of the column
"variants": [ // alternative names for the column
"Txn Amt",
"Amount of Txn"
],
"cat": "finance", // category of data generator from the @faker-js/faker library
"type": "amount", // @faker-js/faker generator function name
"convert": true, // `true` if the value generated needs to be converted to a number,
// otherwise this can be omitted
"min": 100, // list of parameters values to be submitted to the faker function
"max": 99999,
"dec": 2,
"optional": true // columns marked as "optional" don't have to be generated
},
{
"name": "id",
"cat": "datatype",
"type": "number",
"opts": true, // some faker functions require parameters to be submitted as an
// `options` object. setting `"opts": true` is indicates this
"unique": true, // setting `unique` to `true` makes the generated values globally
// uniquewithin the scope of the table
"min": 100001,
"max": 999999,
"precision": 1
},
// ...other column specs here
]
@faker-js/faker
is a JavaScript library for generating fake data. It has quite excellent documentation. This project is basically just a CLI wrapper around Faker. In theory, you can use any of the Faker functions in your column specifications by specifying the appropriate category and function name in the cat
and type
fields for the column specs. In practice, only the generators used in the provided colspec.json
have been tested.
This project is fully unit tested. e2e tests have been planned but not implemented yet (see issues: #1 and #2). Tests can be run from the command line using npm test
. Running npm run coverage
will run the tests and produce a coverage report both a summary in the terminal, and a full analysis in coverage/lcov-report/index.html
.
| :Warning: |
Because some of the functions being tested are based on random numbers, there MAY be some failures when you run the tests due to expected outputs being slightly out of range. Generally, re-running the tests will allow them to pass. If you run them four or five times in a row and still get an error, there may be something needing to be fixed. |
---|
Questions about this library should be directed to [email protected], or if you work with me, to my work email.