-
Notifications
You must be signed in to change notification settings - Fork 903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make kedro-datasets a dependency of kedro? #1776
Comments
What stuff moves in
|
@AntonyMilneQB quick question, how will this work if |
How will it work in terms of Python packaging? I'm not sure 😅 We should definitely check that! |
Yes, it seems a weird thing to have a shared namespace with different distributions. I suspect it may still work since pip probably just copies the files into the same folder, but we should check. |
Yeah, definitely. Especially given this warning:
If we do have problems with this then I think there would probably be a way around it using some sort of aliasing so that importing |
We discussed this issue in a Technical Design session:
|
Following a discussion with @idanov, we are not going with this proposal. All of The solution will instead be to make |
My latest proposal in the ongoing discussion of kedro-datasets. See #1758 for context.
Proposal
kedro-datasets
becoming a separate namespace package as we are currently in [PAUSED] Namespace kedro-datasets as kedro.datasets kedro-plugins#49kedro.io
tokedro-datasets
also. This would be another namespace package, so import paths would remain the same askedro.io
kedro-datasets
becomes a core dependency ofkedro
. i.e.pip install kedro
also doespip install kedro-datasets
pandas.CSVDataSet
are defined now. We implementextras_require
in kedro so thatpip install kedro[pandas.CSVDataSet]
also doespip install kedro-datasets[pandas.CSVDataSet]
To be clear, the structure of
kedro-datasets
would be:Pros
pip install kedro[pandas.CSVDataSet]
. They don't need to worry about the existence ofkedro-datasets
apart from if they want to manually update to a version beyond the "officially supported" one specified in the kedro requirementspip install kedro-datasets
and use the datasets by themselves (probably). In practice this might not happen much, but having a self-contained fully-functioning package feels like a clean way to split things upAbstractDataSet
etc. live in the same place as the implementations. e.g. if we changeAbstractDataSet
(which I think we should in the not too distant future: see Re-design io.core and io.data_catalog #1778) then this is much easier to handlekedro-datasets
does not depend onkedro
kedro-datasets
? #1651 becomes simpler, since all we need to check is which version ofkedro-datasets
to fetch based on kedro's requirements.txtCons
AbstractDataSet
are closer to core components (e.g. runner, pipeline) than they are to dataset implementations, which are really the optional extra thing that should be split out.Key questions
kedro.io
supposed to remain inkedro
rather than move tokedro-datasets
? How important is that?The text was updated successfully, but these errors were encountered: