Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EDTF date validation fails on values containing unspecified digits and qualifiers #669

Closed
joshdentremont opened this issue Aug 9, 2023 · 15 comments
Assignees
Labels
documentation Improvements or additions to documentation limitation of dependency A known limitation resulting from another library/application

Comments

@joshdentremont
Copy link
Contributor

The EDTF python module says that 198X? is invalid, but it also produces that same format when parsing dates: ixc/python-edtf#38

Drupal allows this type of date in an EDTF field, but there was discussion in Slack about whether it is valid EDTF.

I have reached out to LOC for clarification and will reply here when I hear back.

@mjordan mjordan self-assigned this Aug 9, 2023
@mjordan mjordan added documentation Improvements or additions to documentation limitation of dependency A known limitation resulting from another library/application labels Aug 9, 2023
@mjordan
Copy link
Owner

mjordan commented Aug 9, 2023

198X? is valid EDTF so all we can do is document that this type of value in Workbench CSV won't validate and needs to be added to Drupal entities manually.

@joshdentremont
Copy link
Contributor Author

Response from LoC:

Yes, that’s perfectly fine for EDTF.

So you can have one or more X’s to indicate the level of imprecision, and then a ? to indicate you’re not even sure of that date expression as a whole. 198X? would mean “sometime in the 80’s, I think”.
I could easily see that formulation applying for older time frames.

@mjordan
Copy link
Owner

mjordan commented Aug 10, 2023

Thanks @joshdentremont. workbench_utils.py has a validate_edtf_date(date) function that wraps the EDTF library's validation function. We could add validation of that pattern to that wrapper function, at least for now until the library can validate it. I'll propose that over in slack.

@mjordan mjordan changed the title Unsure whether unspecified digits and qualifiers should be allowed in same date EDTF date validation fails on values containing unspecified digits and qualifiers Aug 10, 2023
@mjordan
Copy link
Owner

mjordan commented Aug 10, 2023

I'm going to set up a test suite to determine exactly which patterns aren't passing validation, so our temporary fix can scoped as small as possible and not duplicate what the library does. So far we have:

If anyone can verify other patterns, please drop them here.

@joshdentremont
Copy link
Contributor Author

All of the following cause errors:
1XXX?
11XX?
111X?
1XXX~
11XX~
111X~
1XXX%
11XX%
111X%

These also cause errors, but I'm not sure they would ever come up in actual use, and they may not be valid EDTF
XXXX?
XXXX~
XXXX%

@seth-shaw-asu
Copy link
Contributor

These also cause errors, but I'm not sure they would ever come up in actual use, and they may not be valid EDTF

  • XXXX?
  • XXXX~
  • XXXX%

I agree, they are unlikely to be used, but they are valid. These are just some of the oddities that EDTF allows.

@mjordan
Copy link
Owner

mjordan commented Aug 14, 2023

Do we have to validate that that date is not in the future, e.g. 206X%?

gist of validation against the patterns above is at https://gist.github.com/mjordan/3339b6d1c94ede805b1337184de07a8a if anyone wants to give it a run (provide the date string as an argument to the script, e.g, edtf_validate.py 202X?).

@seth-shaw-asu
Copy link
Contributor

We do not have to validate that the date is current or past. Nothing in the spec indicates we can't specify future dates.

mjordan added a commit that referenced this issue Aug 16, 2023
@mjordan
Copy link
Owner

mjordan commented Aug 16, 2023

I've pushed up the issue_669 branch incorporating the extended validation patterns. @joshdentremont can you test please? If you would like to contribute more test values for inclusion in https://github.com/mjordan/islandora_workbench/blob/issue_669/tests/unit_tests.py#L319 let me know.

@seth-shaw-asu
Copy link
Contributor

seth-shaw-asu commented Aug 16, 2023

I think these "bad" values are technically acceptable: https://github.com/mjordan/islandora_workbench/blob/issue_669/tests/unit_tests.py#L358-L359

The year component doesn't need to be four digits. (Controlled Access Terms' EDTF validator was recently updated to remove this erroneous assumption: Islandora/controlled_access_terms@e7779e4)

@mjordan
Copy link
Owner

mjordan commented Aug 16, 2023

OK, thanks. What's the minimum number of years we should be validating for?

@seth-shaw-asu
Copy link
Contributor

I am mistaken, it does require four years.

I went back to review the spec and was reminded that EDTF is an extension of ISO 8601, which means that the base rules of that spec still apply. Although EDTF only specifies a required number of year digits for significant digits notation, the base rules for ISO 8601 do require four digits.

So, mea cupla, those three-digit years are invalid.

@mjordan
Copy link
Owner

mjordan commented Aug 16, 2023

Ah, good to know, thanks for checking.

@joshdentremont
Copy link
Contributor Author

joshdentremont commented Aug 17, 2023

@mjordan Looks good. The values of YYYX? that were previously failing are no longer throwing errors.

Thanks

@mjordan
Copy link
Owner

mjordan commented Aug 17, 2023

OK, thanks, I've merged the issue_669 branch with 3b1a448 so am closing this issue.

@mjordan mjordan closed this as completed Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation limitation of dependency A known limitation resulting from another library/application
Projects
None yet
Development

No branches or pull requests

3 participants