Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 1633 unexpected key error with primary key in schema and 1635 unexpected missing label error when ignoring header case #1641

Merged
merged 62 commits into from
Aug 30, 2024

Conversation

amelie-rondot
Copy link
Contributor

@amelie-rondot amelie-rondot commented Feb 12, 2024


This PR fixes unexpected missing-label error in validation report which occurs when validating tabular data in this specific case (#1635): The data contains a named column corresponding to a required field in the schema used for validation, written without respecting case-sensitivity.
For example:

data = [["aa", "BB"], ["a", "b"]]
schema = {
        "$schema": "https://frictionlessdata.io/schemas/table-schema.json",
        "fields": [
            {"name": "AA", "constraints": {"required": True}},
            {"name": "bb", "constraints": {"required": True}}
        ]
    }

It adds some test cases:

  • one test case to ensure that validating this tabular data with this schema and using schema_sync=True and dialect=Dialect(header_case=False) options, returns a valid report.
  • one test case to ensure that if validation is case-sensitive, if a name column does not respect case a missing-label occurs in validation report.
  • another test case to ensure that validating with missing required column "AA" with this schema and using schema_sync=True and dialect=Dialect(header_case=False) options, returns an invalid report with missing-label error related to field "AA".

This PR also fixes unexpected KeyError raised when a primary key used in shema is missing in the tabular data to validate(#1633).
For example:

data = [["B"], ["b"]]
schema = {
        "$schema": "https://frictionlessdata.io/schemas/table-schema.json",
        "fields": [
            {"name": "A"},
            {"name": "B"}
        ],
        "primaryKey": ["A"]
    }

It adds some test cases:

  • two test cases to ensure that when a label related to the schema primary key is missing, the validation report is invalid with single missing-label error:
    • a test case with the schema containing the primary key field not specified as 'required'
    • a test case with the schema containing the primary key field specified as 'required'
  • a test case to deal with insensitivity case in the data label corresponding to the primary key schema field: the validation report is valid

Finally, I suggest some refactoring in this PR:

  • refactoring removing missing required label from field info refactoring removing missing required label from field info
  • refactoring creating schema fields among labels part when using schema_sync option in detect_schema() Detector method
  • refactoring to introduce intermediary TableResource methods to create cells related to primary key schema.

@amelie-rondot amelie-rondot changed the title Fix 1633 and 1635 Fix 1633 unexpected key error with primary key in schema and 1635 unexpected missing label error when ignoring header case Feb 12, 2024
@amelie-rondot amelie-rondot marked this pull request as ready for review February 13, 2024 08:16
pierrecamilleri

This comment was marked as resolved.

@pierrecamilleri
Copy link
Collaborator

Hi @roll,

This PR is ready for your review. Some things somewhat related to this PR, detailed below bother me a little bit. However, the bugs that are fixed by this PR are blocking for our migration from v4 to v5 on Validata. Would you mind reviewing it as is, and may I transform these concerns into issues and propose associated pull requests?

Please don't hesitate if something is bothering you, if I am wrong in any of my points or missed the obvious. Thanks.

Details :

  • supporting "header_case" is currently a headache as you have to check each time you use a label or schema field or primary key that you covered the case with "header_case = False". I would imagine that if "header_case" is ignored (False), then you could store all data in lowercase once and for all and not bother anymore.

  • Due to the way schema sync is implemented, you have to first include the missing required fields (in order to have the related errors), but then remove them for the row_stream to be correct. This feels overly complex, but I have not yet a clear alternative in mind.

  • A primary key is implicitly required. However, field.required is not "True" for those fields if they are not explicitly set in the schema (if I judge by this line). I would suggest (without having looked much at the code) that "required" should be a @property that returns "True" for a primary key.

@roll
Copy link
Member

roll commented Apr 29, 2024

Hi @pierrecamilleri,

Thanks a lot! Can you please resolve the conflicts as the dev/release process update just been merged #1652

@pierrecamilleri
Copy link
Collaborator

pierrecamilleri commented Apr 29, 2024

Hi @pierrecamilleri,

Thanks a lot! Can you please resolve the conflicts as the dev/release process update just been merged #1652

Done

EDIT: Oops tests no longer pass, I spoke too fast. I look into it

@pierrecamilleri
Copy link
Collaborator

All set !

@pdelboca pdelboca self-requested a review August 20, 2024 06:25
@pdelboca
Copy link
Contributor

@pierrecamilleri we shoudn't be using pytest.lazy_fixture in our codebase since it has been deprecated: TvoroG/pytest-lazy-fixture#16

Could we instead call from pytest_lazyfixture import lazy_fixture as suggested in TvoroG/pytest-lazy-fixture#22 (comment).

It is not part of your PR but would be nice to fix it.

@pdelboca
Copy link
Contributor

@pierrecamilleri wait, I will fix this in a new PR instead and then we can merge the fix into this two PRs.

@pdelboca
Copy link
Contributor

@pierrecamilleri I just merged #1674 . Could you please rebase and then merge when test are passing?

amelie-rondot and others added 27 commits August 30, 2024 13:02
@pierrecamilleri pierrecamilleri merged commit 7861f0e into frictionlessdata:main Aug 30, 2024
9 checks passed
@pierrecamilleri pierrecamilleri deleted the Fix-1633-and-1635 branch August 30, 2024 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unexpected "missing-label" error with option header_case = False KeyError when a "primaryKey" is missing
4 participants