Issue #108 - Create CLI for csvqb #157

robons · 2021-08-27T09:00:52Z

Satisfies issue #108.

Command Line Interface

Usage: csvqb [OPTIONS] COMMAND [ARGS]...
  CSVqb - a tool to generate qb-flavoured CSV-W cubes.
Options:
  -h, --help  Show this message and exit.
Commands:
  build  Build a qb-flavoured CSV-W from a tidy CSV.

Build

Build the qb-flavoured CSV-Ws from an info.json V1 file.

Usage: csvqb build [OPTIONS] TIDY_CSV_PATH
  Build a qb-flavoured CSV-W from a tidy CSV.
Options:
  -c, --config CONFIG_PATH        Location of the info.json file containing
                                  the QB column mapping definitions.
                                  [required]
  -o, --out OUT_DIR               Location of the CSV-W outputs.  [default:
                                  out]
  --fail-when-validation-error / --ignore-validation-errors
                                  Fail when validation errors occur or ignore
                                  validation errors and continue generating a
                                  CSV-W.  [default: fail-when-validation-
                                  error]
  --validation-errors-to-file     Save validation errors to an `validation-
                                  errors.json` file in the output directory.
                                  [default: False]
  -h, --help                      Show this message and exit.

example:

csvqb build --config info.json data.csv

Outputs resulting CSV-Ws into the ./out directory (which is created if it does not already exist).

…nerating CSV-W qb-flavoured outputs.

santhosh-thangavel · 2021-08-27T15:06:35Z

csvqb/csvqb/README.md

+                                  [required]
+  -o, --out OUT_DIR               Location of the CSV-W outputs.  [default:
+                                  out]
+  --fail-when-validation-error / --ignore-validation-errors


Why do we need to ignore validation errors?

Because someone might need to make a decision that they know better than the validation errors at some point. It's usually a good idea to let humans override machines.

santhosh-thangavel · 2021-08-27T15:07:26Z

csvqb/csvqb/cli/build.py

+    print(f"{Style.DIM}CSV: {csv_path.absolute()}")
+    print(f"{Style.DIM}info.json: {info_json.absolute()}")
+    data = pd.read_csv(csv_path)
+    assert isinstance(data, pd.DataFrame)


Does it mean that we need just a Pd.DataFrame of tidy data with no Cube’s class at the end of data transformation?

That's the hope, we should get there eventually with this.

santhosh-thangavel · 2021-08-27T15:08:47Z

csvqb/csvqb/configloaders/infojson.py

        maybe_config = column_mappings.get(col_title)
        defined_columns.append(
            _get_column_for_metadata_config(col_title, maybe_config, data[col_title])
        )

+    columns_missing_in_data = set(column_mappings.keys()) - set(column_titles_in_data)


What is column_mappings.keys() and column_titles_in_data? And why do we need a set difference between those two?

column_mappings.keys() are the names of the columns with mappings defined in the info.json file. We do this set different to find column mappings which are provided in the info.json but don't exist in the actual dataframe.

This highlights that the user has made some kind of error and has probably missed out a column from their CSV.

santhosh-thangavel · 2021-08-27T15:26:18Z

@robons Thanks for clarifying the doubts.

robons · 2021-08-31T07:49:03Z

@robons Thanks for clarifying the doubts.

Are you happy enough to approve the PR, @santhosh-thangavel?

santhosh-thangavel

Thanks

Rob Barry added 7 commits August 26, 2021 17:24

Issue #108 - Initial CLI taking an info.json file with a tidy CSV, ge…

26f103f

…nerating CSV-W qb-flavoured outputs.

Making pyright happier.

1554590

Issue #108 - Added some behave tests for the CLI.

794bbbd

Issue #108 - Adding a basic readme file describing CLI usage.

3b9618b

Fix the test I broke.

25c2482

Ensuring colorama is actually installed.

a87716d

Forgot to sync with setup.py!

9f005dd

robons changed the title ~~Robons 108~~ Issue #108 - Create CLI for csvqb Aug 27, 2021

robons marked this pull request as ready for review August 27, 2021 09:59

santhosh-thangavel reviewed Aug 27, 2021

View reviewed changes

santhosh-thangavel approved these changes Aug 31, 2021

View reviewed changes

robons merged commit 0b51d55 into main Aug 31, 2021

robons deleted the robons-108 branch August 31, 2021 13:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue #108 - Create CLI for csvqb #157

Issue #108 - Create CLI for csvqb #157

robons commented Aug 27, 2021 •

edited

Loading

santhosh-thangavel Aug 27, 2021

robons Aug 27, 2021

santhosh-thangavel Aug 27, 2021

robons Aug 27, 2021

santhosh-thangavel Aug 27, 2021

robons Aug 27, 2021

santhosh-thangavel commented Aug 27, 2021

robons commented Aug 31, 2021

santhosh-thangavel left a comment

Issue #108 - Create CLI for csvqb #157

Issue #108 - Create CLI for csvqb #157

Conversation

robons commented Aug 27, 2021 • edited Loading

Command Line Interface

Build

santhosh-thangavel Aug 27, 2021

Choose a reason for hiding this comment

robons Aug 27, 2021

Choose a reason for hiding this comment

santhosh-thangavel Aug 27, 2021

Choose a reason for hiding this comment

robons Aug 27, 2021

Choose a reason for hiding this comment

santhosh-thangavel Aug 27, 2021

Choose a reason for hiding this comment

robons Aug 27, 2021

Choose a reason for hiding this comment

santhosh-thangavel commented Aug 27, 2021

robons commented Aug 31, 2021

santhosh-thangavel left a comment

Choose a reason for hiding this comment

robons commented Aug 27, 2021 •

edited

Loading