-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardizing working with multiple columns #2016
Comments
Thank you for writing this down. I have one comment though before we move forward.
Just write And in general There is one small difference between what you propose and In summary - the question is - could you please clarify what does |
Thank you for your response. I think a Nonetheless, I wonder if for any function that currently takes in a Whatever choice we make, we should clarify this as a contract to the user and comb through the code base to add |
Yes, all functions that allow specifying columns should probably accept any of the selector types. I wonder whether we really need people to write |
My proposition is that we should allow the user to input a Vector of all selector types then immediately call |
We could make The only reservation against there proposals is that Let us decide what we want and I can implement the changes. |
That raises an interesting question: what should e.g. a |
Stata de-duplicates in things like these, so I support us de-duplicating and maintaining first appearance. |
I am OK with the general idea of de-duplicating. Avoid de-duplicating things like Apart from this small glitch that we should settle down there are two things to do:
Here the question is do you want me to do both (I am OK to do this) or someone is willing to make a PR with this change. |
My suggestion was to keep |
It is crucial that we are precise here (as what you propose means that we cannot simply call So I understand the rule you propose is (this is not something we have currently and it is actually much more flexible - I am not sure if we really want to allow it):
Now the question is - how often such mixing would be needed and is it worth to add this complexity as currently we have Just to be clear - I am not strongly against it. The only thing is that I feel that this is not a super common pattern and it duplicates the functionality we already have that Of course - we still should allow |
Just to give an example:
would be a valid form and would be equivalent to selecting the first column just with |
Thank you for your detailed comment. Your summary is exactly the behavior I was imagining. You are correct that this logic does result in some odd behavior, like This week is tough for me (Thanksgiving in US and classes ramping up). But I would like to find time to, as you say, make this as precise as possible into a PR that just explores documentation. I agree that it would be nice to have more feedback from users. My expectation is that there are many users who miss Stata's seamless handling of multiple types of column selection. However in addition, this Issue is partly motivated by ease of development as well. If we had a more binding (and consistent) contract for what inputs could be considered, I believe it would make argument handling more streamlined (similar for Perhaps the week after next I will be able to write a PR. |
I think it is less effort to just to write down the specification you propose here before making a PR when you have time for this. Just as an additional comment
So it has to be So in summary the discussion is how heterogenous |
I wouldn't say it's high priority, but that behavior would make sense. Processing inputs recursively may not be necessary though. |
Fortunately whole this functionality is non-breaking so it can be added at any time.
It would be a side effect. So I just clarify that this is what will happen when we implement this. |
This is what I assume. I want to have one set of rules (like we did for indexing) and then make sure we consistently implement it. |
Curious about this, to what extent is something |
This feature is almost there (just need to finish
(there is some small amount of syntax noise in comparison to what you have written, but at the benefit that we can "strictly" handle dispatch) Then "for free" we will get broadcasting like this in split-apply-combine when |
Wonderful! This is great to hear. |
We should probably allow |
We can allow for this in the future. I try to make the design of |
@pdeffebach - all what you wanted seems to be done in master. Do you need anything else to be added or we can close this? |
@pdeffebach - can this be closed? |
Let's keep this up until |
OK. We have tracked it in #2171 so I leave up to you to keep it open or close it. |
I propose adding utility functions for more flexibility in working with multiple column arguments in DataFrames.
The motivation for this stems from the Stata functionality for
keep
. In Stata, one can writeWhat's more, we can write
We are close to being able to do this in DataFrames, but not there yet. Currently we can write
This is pretty good for the user. However it makes it hard to write code. For instance, what if I want to drop those variables instead of keep them? I think that would be hard to do with the current system while being very easy to do in Stata.
Our current system is also inconsistent with splatting arguments. Take the function
dropmissing
for example.This function only allows
cols
to be one ofNot
,Between
, etc. rather than a vector. And it doesn't splat, meaning you couldn't dodropmissing(df, Between(:a, :d), Between(:f, :m)
.It would require a lot of work to support these kinds of function calls on a case-by case basis. Rather, I propose we standardize this kind of argument with a function of the form
Additionally, given the growing consensus about the
Pair
s interface design, I propse a similar form of sanitation for inputs of the formWorking with these inputs was a hard thing to do in #1620, though I regret abandoning the PR. Now that we allow
regex
incombine
, this makes working withVector
s of inputs harder. It would be nice to have something whichVector
-like arguments (Between
,Regex
, etc.) to a single named tuple andPair
arguments and the keyword arguments into a single named tupleGiven the likely addition of a
transform
fun as well as the already-existinginsertcols!
function, I think this would be a valuable tool-set to build and make it easier to standardize the interface.The text was updated successfully, but these errors were encountered: