-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype parsing rules for dataset factories #2508
Comments
I did some investigation about how Ruby/Django/React doing with their Router, they all differ in slightly way but it follows a clear declaration order. i.e. for Django you have a I am in favor of simple solution at this stage, which is simply following the declaration order. We could add more complicated things like "local scope" or namespace later when needed. |
One heuristic for "most specific pattern should match first" would be to count the number of @noklam following declaration order would be a great way to do this, but the problem is that we merge together multiple config files so the pattern that you're matching against could come from a different file. In this case it's not clear what declaration order means. Ideally we would e.g. give priority to a matching pattern defined in the same file first before looking in other files, but I don't think that will be easily possible due to the way we merge config files together since they're all treated equally. And what if you match against a pattern in two different files? Then we'd need to work out a rule for e.g. giving preference to the file based on alphabetical order. Besides, the information about what file the definition comes from is long gone by the time we would need it (when instantiating the dataset rather than when doing config loading). So doing this based on declaration order would be great and very simple but I suspect there just's not a good way to get this to work in our case. Hence needing to define our own rules (like most explicit pattern matches first, followed by alphabetical). This is my main reservation against this whole approach, but unfortunately I don't see a way round it. |
The prototype has been finished and will be used in the full implementation for #2423 |
Description
Subtask of #2423
Context
For the dataset factories solution we'll need a way to parse the syntax to match datasets against.
Parse (https://github.com/r1chardj0n3s/parse) is a library that has a pattern-matching syntax which uses reverse Python f-strings format.
Example of what our syntax could be:
And the function that would create the dataset entry:
Things to investigate
The goal of this ticket is to experiment with that library and come up with rules on how the matching should work.
It's especially important to determine what needs to happen when two patterns would match a datasets. E.g.
france.companies@spark
against:Pattern matching is a common problem in url matching, so take inspiration from web frameworks who solve that problem, e.g. Ruby/React.
Initial thoughts on rules
The text was updated successfully, but these errors were encountered: