-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix caching configuration #3785
Conversation
Change the 'identifier' to always refer to the process_type. Use the 'fnmatch' stdlib to allow globbing in the configuration. This is technically a breaking change, because the interface of 'get_use_cache', 'enable_caching', 'disable_caching' no longer allows the 'node_class' interface. However, the previous version silently did nothing when this argument is passed, so this just makes the issue more apparent.
When using globbing in the caching configuration, we want the ability to override a general glob (e.g. aiida.calculations:*) with a more specific one (e.g. aiida.calculations:aiida-diff.*). This commit implements the logic and tests for this behavior. To check which identifier pattern is more specific, the patterns are fnmatch'ed against each other. If all other patterns match a given pattern, it is the most specific.
Thinking about this some more, the logic for checking which pattern is "most specific" is incorrect in corner cases. The criterion for being the most specific pattern should be that the set of identifiers that it matches is a strict subset of all other identifiers. Here's a counter example for the current logic: With patterns pattern1 = 'a????c?`
pattern2 = 'a[ba]c*'
A = 'afcdecf'
B = 'abccccd'
C = 'abccccde'
The root of this problem lies with the Should we just restrict the patterns so that they can not contain Besides this issue, we still need to update the documentation about caching configuration. |
Also, we should be using |
The 'fnmatch' implementation allows using '?' (any single character) or '[abc]' (any of the characters in brackets) matches. These are problematic when trying to determine the 'most specific' match to a given identifier. To avoid this, we instead implement matching of _only_ '*' wildcards with 're'. This is similar to how 'fnmatch' is implemented, except for removing the extra logic needed for '?' and '[]' matches.
I decided to go with the The documentation is now also updated, so this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @greschd looking good. I agree with your assessment of only supporting *
is the better option. It should give more than enough flexibility and keeps the implementation not overly complex. Just one question if we should maybe add a little more validation on values in the configuration file.
try: | ||
type_check(config[ConfigKeys.DEFAULT.value], bool) | ||
type_check(config[ConfigKeys.ENABLED.value], list) | ||
type_check(config[ConfigKeys.DISABLED.value], list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add an explicit check on the actual values in the disabled and enabled list? It can only be a string with a valid process type string, optional with wildcard. Valid process types are:
aiida.plugins.entry_point.is_valid_entry_point_string
returnsTrue
- Or it is a python module path. Should be possible to write a validation function for. I think essentially
[A-Za-z0-9_\.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I agree it would be good to have at least the validation that it could be a valid process type string. Because of the possible wildcards, we probably have to implement the check separately and can't re-use the is_valid_entry_point_string
.
For the entry points, as far as I see the valid form is
<group_name>:<something>
, where
group_name
is one ofaiida.data
,aiida.calculations
, ..., as defined by theentry_point_group_to_module_path_map
something
can be literally anything except it can not contain another semicolon
For the fully qualified python name, each colon-separated part must fulfill
name.isidentifier() and not keyword.iskeyword(name)
The only two restrictions I'm aware of (compared to the simple regex) is that it can't start with a number, and can't be a keyword (like in
, for
, else
, ...).
All this is a bit tricky to check for due to the wildcards, but I think I can come up with a reasonable solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, had a crack at this in _validate_identifier_pattern
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two more minor changes. The logic of the identifier validator seems OK, just have two questions there
aiida/manage/caching.py
Outdated
'the `node_class` argument is deprecated and will be removed in `v2.0.0`. ' | ||
'Use the `identifier` argument instead', AiidaDeprecationWarning | ||
) | ||
type_check(identifier, (type(None), str)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allow_none=True
Co-Authored-By: Sebastiaan Huber <[email protected]>
Fixes #3673
This changes the meaning of the
identifier
used in the caching configuration from being an entry point string, to being theprocess_type
of the node.Since this change removes the ability to select process classes via subclassing (isinstance check), we instead allow the configuration to contain globbing patterns, as implemented in the
fnmatch
standard library.Multiple globbing patterns can apply to a single process type, as long as one of them is uniquely the most specific.
Since the new configuration allows arbitrary string, the validation of the caching configuration on load is now less strict -- we can not check if the process type exists.