-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for quantity in validator #500
Check for quantity in validator #500
Conversation
* Fix hdmf-dev#197 * Includes new validator tests * Refactor GroupValidator for readability
Please let me know if this looks like it covers the expected edge cases, and I'm very open to feedback and suggestions here. It took me a while to build up my mental model of what is going on in the validator and how it relates to specs and builders, so I very well could be missing things. I'm also very open to any feedback in terms of how I've refactored the code and whether or not it fits with the expected style of the rest of the library. |
There were a few things I discovered while going through the validator code, and am not sure if this is something the schema shouldn't allow or if the validator doesn't work in these cases. They seemed to be best handled outside of the scope of this PR, so I thought I'd bring them up here to discuss and we can create separate issues if needed.
It seems like perhaps the code in I'm still putting all the pieces together in my head, so maybe I'm missing a few things here, but hopefully this is helpful |
src/hdmf/validate/validator.py
Outdated
for _, value in builder.items(): | ||
v_builder = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for _, value in builder.items(): | |
v_builder = value | |
for v_builder in builder.values(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion to just iterate over .values()
, I included that change.
Removing the assignment on line 449 also looks reasonable, but on closer inspection, it would end up changing the logic for one particular edge case: if value
is a LinkBuilder
which links to something inheriting from BaseBuilder
, and that BaseBuilder
has a data_type
, then it would store the BaseBuilder
in the data_types
map, as opposed to the current behavior which is to store the LinkBuilder
in the map.
excerpt for reference:
v_builder = value
if isinstance(v_builder, LinkBuilder):
v_builder = v_builder.builder
if isinstance(v_builder, BaseBuilder):
dt = v_builder.attributes.get(self.spec.type_key())
if dt is not None:
data_types[dt].append(value)
The implementation here hasn't changed from the previous version, it was just extracted into a separate function. Looking at the usage of the return value of __group_children_by_data_type
after it is passed to __validate_child_data_type
, there is another check if the builder is a LinkedBuilder
(line 467), so it seems to me like the current implementation is consistent. However, by making the change you suggested, all unit tests still pass, so it's not obvious that making that change would be incorrect.
I think there will likely be some large changes to validation logic in the solution for the new issue: #507 so my suggestion is to verify that the solution for that issue verifies correct behavior here. Do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. How you have it now, keeping the assignment on line 449, is correct. We want to pass on the fact that a LinkBuilder
is associated with the data type. Good catch.
The refactoring looks great. Thanks! |
That is not correct/intended behavior. Each child should be validated. Can you create a new issue about this?
I wrote privately to you that this should not be allowed by the schema language; however, on further thought, I think this case is OK, and the current code that validates only the link spec and not the group/dataset spec is incorrect/bugged.
Great question! I would say the answer is no. If a group specifies two unnamed child groups of different types, then having a single child means one of the required specs is not fulfilled. Now what if the file contains two children of type B? Then the most restrictive spec (type B) should be matched first, and then move to less restrictive requirements, which would match the second child with the type A spec. I don't know if the validator handles this correctly yet.
Yes. If a group has multiple named instances of the same type, e.g., a DynamicTable with three differently named columns of type VectorData, then it looks like this code validates only the last instance. Also if there is an included VectorData dataset and a linked VectorData, regardless of name, then only the link is validated. These are not correct behaviors. Can you create a new issue about this?
Very helpful! This is great attention to detail and the different edge cases here. |
This PR looks good to me. I suggested some minor code changes. Feel free to accept/reject. If you want, you can address some of the issues you raised above within this PR, but it could also be done in a separate PR another day. |
Thanks for the review @rly! I created an issue to cover the additional incorrect behavior discussed above: #507 I think they are related and could be resolved together by implementing new logic for how builders are matched against specs in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rly I made a pair of small changes you suggested, and created a new issue for the remainder. Please take a look if you think that covers everything, and then I think the PR is ready.
src/hdmf/validate/validator.py
Outdated
for _, value in builder.items(): | ||
v_builder = value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion to just iterate over .values()
, I included that change.
Removing the assignment on line 449 also looks reasonable, but on closer inspection, it would end up changing the logic for one particular edge case: if value
is a LinkBuilder
which links to something inheriting from BaseBuilder
, and that BaseBuilder
has a data_type
, then it would store the BaseBuilder
in the data_types
map, as opposed to the current behavior which is to store the LinkBuilder
in the map.
excerpt for reference:
v_builder = value
if isinstance(v_builder, LinkBuilder):
v_builder = v_builder.builder
if isinstance(v_builder, BaseBuilder):
dt = v_builder.attributes.get(self.spec.type_key())
if dt is not None:
data_types[dt].append(value)
The implementation here hasn't changed from the previous version, it was just extracted into a separate function. Looking at the usage of the return value of __group_children_by_data_type
after it is passed to __validate_child_data_type
, there is another check if the builder is a LinkedBuilder
(line 467), so it seems to me like the current implementation is consistent. However, by making the change you suggested, all unit tests still pass, so it's not obvious that making that change would be incorrect.
I think there will likely be some large changes to validation logic in the solution for the new issue: #507 so my suggestion is to verify that the solution for that issue verifies correct behavior here. Do you agree?
* dev: Fix building of Data with no dtype spec and value DataIO wrapping DCI (hdmf-dev#513) Allow np.bool_ as a valid type when bool is specified in spec (hdmf-dev#505) Check for quantity in validator (hdmf-dev#500) added driver option for ros3 (hdmf-dev#506) Use dockerhub authentication (hdmf-dev#432)
* upstream/dev: Fix building of Data with no dtype spec and value DataIO wrapping DCI (hdmf-dev#513) Allow np.bool_ as a valid type when bool is specified in spec (hdmf-dev#505) Check for quantity in validator (hdmf-dev#500) added driver option for ros3 (hdmf-dev#506) Use dockerhub authentication (hdmf-dev#432)
Motivation
Check that the quantity of child groups/datasets/links of a group match what is defined in the spec for the group.
IncorrectQuantityError
sGroupValidator
so that it is easier to improve in the futureHow to test the behavior?
Checklist
flake8
from the source directory.