-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Functions/structs to move out of DataFrames to DataAPI (DataFrames syntax for DTable) #3075
Comments
We can do it relatively easily, but I think it should be a separate interface package, not DataAPI.jl. @nalimilan - what do you think? |
Why not DataAPI.jl? It seems like it would be fine there. |
Because these are not only signatures, but also quite complex logic (probably in total it is ~1000LOC - I have not counted). I thought we want to keep DataAPI.jl lightweight so that it does not add compilation latency to packages that depend on it. |
Yes, DataAPI should remain really lightweight. We could include empty definitions there (or basic constructors) but not big implementations. These would sound more suited to something like TableTransforms, but I'm not sure they are generic enough for that. Does the Dagger implementation support any kind of Tables.jl object that may be wrapped by a Otherwise we can create another package. We talked about having DataFramesBase in the past (#1764). Though the drawback is that it can make it more difficult to develop DataFrames. We would have to define more precisely what is the API and try to keep it stable. |
There is no benefit from this. The point is to have a shared implementation so that both packages process operation specification language in the same way.
They are not
As commented above - the objective of the functionalities that @krynju proposes to extract out is narrower than full implementation of common data frame operations. It is just pre-processing of operation specification language (and this is something essentially independent from |
Yes, this is basically all about parsing the Let's take this piece of code as an example: select(df::DTable, @nospecialize(args...); copycols::Bool=true, renamecols::Bool=true) =
manipulate(df, map(x -> broadcast_pair(df, x), args)...,
copycols=copycols, keeprows=true, renamecols=renamecols) I think the ideal setup for now would be to:
Even if this |
This all makes sense to me; thanks for the explanations. Sounds like a great plan. |
OK. How should we call it then? If it's not specific to DataFrames the name should be relatively general, as this kind of syntax could be useful for other tables. |
I mark this for 1.5 release milestone (maybe it will be OK to make it in 1.4.x patch release as it should be internal). |
@krynju - I am clearing the milestone from this PR. Given our earlier discussions - is this PR needed in the end? |
I'm wrapping up JuliaParallel/Dagger.jl#344 and I managed to narrow down the list of things I need to import from DataFrames to get the DataFrames-style select somewhat working
Here's the list:
With these in a common package I could drop the DataFrames dependency altogether to get the
select
syntax workingI think with some more work
broadcast_pair
could be moved out as well and then one could technically have DataFramesselect
syntax working just by implementing these two functions for their own structure:The text was updated successfully, but these errors were encountered: