Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dataset factories #2635
Dataset factories #2635
Changes from 12 commits
ec66a12
5e6c15d
0fca72c
06ed1a4
b0e3fb9
0833af2
8fc80f9
091f794
8c192ee
3e2642c
c2635d0
d310486
573c67f
9d80de4
549823f
e052ae6
96c219f
c2634e5
eee606a
635510a
394f37b
a1c602d
978d0a5
b4fe7a7
2782dca
85d3df1
8904ce3
bdc953d
fa6c256
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be refactor as a method and have its own unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the cache that add matched pattern into catalog? If so why do we need to keep it separately instead of just creating the entry in catalog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fn just adds the dataset name -> pattern in the
self._pattern_matches_cache
does not create an entry in the catalog. This is because this function is also called in the runner to check for existence of datasets from the pipeline in the catalog. The actual adding of a dataset to the catalog happens inself._get_dataset()
This is to avoid doing the pattern matching multiple times when it's called inside a Session run but the actual resolving and adding is still at
_get_dataset
time to allow for DataCatalog standalone usageThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a good name for a public method, it's way too descriptive. I would suggest that you change it to
def __contains__(self, dataset_name):
. This will allow us to do a test like this:if dataset_name in catalog:...
, which is way more intuitive and nicer thancatalog.exists_in_catalog_config(dataset_name)
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we got rid of the overly verbose method name
self.match_name_against_patterns
and simply replace it withresolve
, which could do all that's needed without leakage of abstraction, all this could be justreturn self.resolve(dataset_name) is not None
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we need this separately from directly resolving upon check of existence? Could we just call this
resolve
and do all of the matching and calling the resolve helper here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using comprehension here makes it look more complicated than it is. Could we keep the original code here where we are using simple set difference?
This reads in English as "give me all inputs minus all the registered ones".