-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix 1611 incorrect errors when validating data with a missiing required column #1620
Fix 1611 incorrect errors when validating data with a missiing required column #1620
Conversation
…detector option in validation of a table resource: test fails
…in creation of row_stream: test passes
…ctor option with two missing required header: test passes
… before processing Row creation
…ta-with-a-missiing-required-column
…ta-with-a-missiing-required-column
Thanks @amelie-rondot ! Can you please |
@roll thanks a lot for reviewing! |
…hen-validating-data-with-a-missiing-required-column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
I added a comment
frictionless/resources/table.py
Outdated
@@ -310,7 +310,18 @@ def row_stream(): | |||
for row_number, cells in enumerated_content_stream: | |||
self.stats.rows += 1 | |||
|
|||
# Create row | |||
# NB: missing required labels are not included in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to move it from the row iteration cycle? Currently, this code will be executed for every row in a table
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Indeed, you're right, it's possible to move this block before initializing row_stream
TableResource
attribute.
I've just committed it. I also added some new test cases to insure that resulting behaviour corresponds to the expected one.
…pen_run_stream() TableResource method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
This PR fixes incorrect errors which occur when validating data with missing required column, for the specific case of
schema_sync
option.After investigating, the better way to fix these incorrect errors we found is to remove missing columns from
field_info
Row.__init__()
parameter at the creation of theRow
inrow_stream
() local method.Test cases have been added to traduce expected behavior:
Validation of tabular data with schema with one or many required fields, and data with one or many missing required columns and
schema_sync
option is set totrue
Other test cases have not been broken.
Expected behavior could also be observed with command line:
with:
Only one error missing-label in the header's field "A" occurs in the validation report, as expected.