-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue #108 - Create CLI for csvqb #157
Conversation
…nerating CSV-W qb-flavoured outputs.
[required] | ||
-o, --out OUT_DIR Location of the CSV-W outputs. [default: | ||
out] | ||
--fail-when-validation-error / --ignore-validation-errors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to ignore validation errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because someone might need to make a decision that they know better than the validation errors at some point. It's usually a good idea to let humans override machines.
print(f"{Style.DIM}CSV: {csv_path.absolute()}") | ||
print(f"{Style.DIM}info.json: {info_json.absolute()}") | ||
data = pd.read_csv(csv_path) | ||
assert isinstance(data, pd.DataFrame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that we need just a Pd.DataFrame of tidy data with no Cube’s class at the end of data transformation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the hope, we should get there eventually with this.
maybe_config = column_mappings.get(col_title) | ||
defined_columns.append( | ||
_get_column_for_metadata_config(col_title, maybe_config, data[col_title]) | ||
) | ||
|
||
columns_missing_in_data = set(column_mappings.keys()) - set(column_titles_in_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is column_mappings.keys() and column_titles_in_data? And why do we need a set difference between those two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
column_mappings.keys()
are the names of the columns with mappings defined in the info.json file. We do this set different to find column mappings which are provided in the info.json but don't exist in the actual dataframe.
This highlights that the user has made some kind of error and has probably missed out a column from their CSV.
@robons Thanks for clarifying the doubts. |
Are you happy enough to approve the PR, @santhosh-thangavel? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
Satisfies issue #108.
Command Line Interface
Build
Build the qb-flavoured CSV-Ws from an info.json V1 file.
example:
Outputs resulting CSV-Ws into the
./out
directory (which is created if it does not already exist).