Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue #108 - Create CLI for csvqb #157

Merged
merged 7 commits into from
Aug 31, 2021
Merged

Issue #108 - Create CLI for csvqb #157

merged 7 commits into from
Aug 31, 2021

Conversation

robons
Copy link
Contributor

@robons robons commented Aug 27, 2021

Satisfies issue #108.

Command Line Interface

Usage: csvqb [OPTIONS] COMMAND [ARGS]...
  CSVqb - a tool to generate qb-flavoured CSV-W cubes.
Options:
  -h, --help  Show this message and exit.
Commands:
  build  Build a qb-flavoured CSV-W from a tidy CSV.

Build

Build the qb-flavoured CSV-Ws from an info.json V1 file.

Usage: csvqb build [OPTIONS] TIDY_CSV_PATH
  Build a qb-flavoured CSV-W from a tidy CSV.
Options:
  -c, --config CONFIG_PATH        Location of the info.json file containing
                                  the QB column mapping definitions.
                                  [required]
  -o, --out OUT_DIR               Location of the CSV-W outputs.  [default:
                                  out]
  --fail-when-validation-error / --ignore-validation-errors
                                  Fail when validation errors occur or ignore
                                  validation errors and continue generating a
                                  CSV-W.  [default: fail-when-validation-
                                  error]
  --validation-errors-to-file     Save validation errors to an `validation-
                                  errors.json` file in the output directory.
                                  [default: False]
  -h, --help                      Show this message and exit.

example:

csvqb build --config info.json data.csv

Outputs resulting CSV-Ws into the ./out directory (which is created if it does not already exist).

image

@robons robons changed the title Robons 108 Issue #108 - Create CLI for csvqb Aug 27, 2021
@robons robons marked this pull request as ready for review August 27, 2021 09:59
[required]
-o, --out OUT_DIR Location of the CSV-W outputs. [default:
out]
--fail-when-validation-error / --ignore-validation-errors
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to ignore validation errors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because someone might need to make a decision that they know better than the validation errors at some point. It's usually a good idea to let humans override machines.

print(f"{Style.DIM}CSV: {csv_path.absolute()}")
print(f"{Style.DIM}info.json: {info_json.absolute()}")
data = pd.read_csv(csv_path)
assert isinstance(data, pd.DataFrame)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean that we need just a Pd.DataFrame of tidy data with no Cube’s class at the end of data transformation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the hope, we should get there eventually with this.

maybe_config = column_mappings.get(col_title)
defined_columns.append(
_get_column_for_metadata_config(col_title, maybe_config, data[col_title])
)

columns_missing_in_data = set(column_mappings.keys()) - set(column_titles_in_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is column_mappings.keys() and column_titles_in_data? And why do we need a set difference between those two?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column_mappings.keys() are the names of the columns with mappings defined in the info.json file. We do this set different to find column mappings which are provided in the info.json but don't exist in the actual dataframe.

This highlights that the user has made some kind of error and has probably missed out a column from their CSV.

@santhosh-thangavel
Copy link
Contributor

@robons Thanks for clarifying the doubts.

@robons
Copy link
Contributor Author

robons commented Aug 31, 2021

@robons Thanks for clarifying the doubts.

Are you happy enough to approve the PR, @santhosh-thangavel?

Copy link
Contributor

@santhosh-thangavel santhosh-thangavel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@robons robons merged commit 0b51d55 into main Aug 31, 2021
@robons robons deleted the robons-108 branch August 31, 2021 13:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants