-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User testing for dataset factories syntax #2602
Comments
SummaryInterviewed 10 users:
Personal favourite user quotes:"It gets rid of the need to do Jinja and it also gets rid of the need to like define a global list that you use in your catalog." 🔥 "I love it definitely love it. I feel it's actually, it's more descriptive than, than we had previously." ❤️ "It seems super simple. It's actually, I wish this was a month ago that was already an option." 🤩 "Decreasing the size of the catalog and potentially even having just one generic catalog that you only need this one file and it will work for all of your different pipelines because the pipeline will change the catalog could be extremely powerful." 💪 About Insights from user interviews1. Users would use dataset factories9 users explicitly said they would use this feature. "Yeah, absolutely.[...] it gets rid of the funky, I guess it replaces the funky Jinja with a funky wild carding but I think this is much more compact and like makes a lot more sense [...] it is more compact than using a big for loop and like having the structure of your code be changed." "Awesome. Like, yeah, to be honest, I super like it right now. Yeah. It's it's super good. It's super descriptive." "It would be super simple to use. Yeah, it's, I will be a, a big fan of it." "So I guess, you know, refactoring huge catalog to small one is always a, a good place because you, you decrease the surface of errors. " However, several users did mention they were cautious about using it by default:
2. There should be a warning about the catch-all pattern that replaces default dataset creation5 users mentioned this explicitly: "if you did a dataset that replaced all the memory dataset, would we get like a warning or an error or something?" "Maybe it'll be nice to, for users if they define something like that, to give them like warning or even to give them a prompt that they made or ask them if it's, if it's expected. Because I guess that that will be the, the first mistake everyone makes when they start working with the feature." 3. Users need clear guidelines/explanation about how the pattern matching works8 users "Only my problem [...] I do not really understand what will happen with Layer here." "I guess maybe that's the question. Is the layer it has to be in the dataset name or does it get like injected if I add it to the node definition?" "I guess point of ambiguity for me, like companies underscore csv, how does it know companies isn't a layer and CSV isn't a data set name if it matches that. Yeah, like something underscore something that feels very unclear." "what happens in the previous example if you have multiple namespaces added to [the dataset], that's possible if I remember correctly. So here you have only one namespace, but you can define a nested structure of name spaces. So what happens with that example?" "You might want to have maybe more examples in the documentation so that people can, can go there and, and try other, other more specific things." "I think it's having just some more examples would kind of be helpful. I think, well now doing this [together] [...] makes it super clear but I think to have some more documentation on that." Users want to know how dataset factories work with existing solutions On how to use this with Users need clarity about which patterns match which datasets There is some confusion about the meaning of characters in patterns and if it conflicts with e.g. transcoding 4. Dataset factories make dataset names in the pipeline very verbose3 users mentioned this: "if we're happy to put sort of quite verbose and and descriptive names in the pipeline, then that's fine, but it just [...] you're moving a problem from one place to another and maybe it's better in that place than the other" "that's my fear that it'll get convoluted like really fast and yeah, that, that may be a thing" "I mean the fact that you had to create some suffix is really problematic, especially when you move between environments and for example, the idea behind the catalog was that you don't have to give a specific names." 5. Users like having an explicit catalog3 users: "my general instinct is I like to be super explicit about things and one of the things that to me is that like I'm not gonna have any clue what the catalog is actually gonna look like until I run the thing." "I'm generally of the belief that like we shouldn't over-engineer the catalog. Like Jinja is a great example of that. I think Jinja overall decreases readability, maintainability, even though it, you know, decreases the number of like keys you need to hit." 6. Dataset factories will enforce structure and naming conventions3 users: "So [for] bigger projects it's more valuable to enforce that structure. I think maybe that's one of the benefits of this is it really pushes people into a corner. They've gotta meet a standard and all the names are going to match each other, which is fantastic." "it's [...] very prescriptive way to, instead of just using generic names or [...] you can use very descriptive, non like nonfunctional name, which is awesome I feel." 7. Dataset factories replaces current solutions3 users: "The only time I've ever used template a config loader was for loops. So this is basically a complete replacement. Like if you, as long as you can have the, the factory for a name space that completely removes the need to do a for loop in my opinion." "What we are seeing right now on the screen for the use cases with namespaces, that actually simplifies everything for them. And I mean it's much simpler and much more logical to define it once like it's shown here instead of, you know, having those copy pasting stuff done in, in catalog yaml." 8. Users find
|
Description
We need to test that the proposed dataset factories implementation for #2423 actually works for users.
Exercise
kedro catalog resolve
Metrics that define success of the feature:
Open questions:
The text was updated successfully, but these errors were encountered: