-
Notifications
You must be signed in to change notification settings - Fork 367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert to using only symbols for column names. #509
Conversation
I'm on board with this conceptually. Let me review the actual code and we'll try to merge this soon. I'm not so worried about breaking things as long as we don't update METADATA without a few day's notice. |
@@ -306,4 +306,5 @@ function data(rc::RComplex) | |||
BitArray(imag(rc.data) .== R_NA_FLOAT64)) | |||
end | |||
|
|||
DataFrame(rl::RList) = DataFrame(map(x->data(x), rl.data), rl.attr["names"].data) | |||
DataFrame(rl::RList) = DataFrame(map(x->data(x), rl.data), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're reading in files from RDA format, we probably need to do column name cleaning before we can make symbols since there will almost certainly be columns with a .
in their names.
Going to take me a bit more time to fully review, but passing tests makes me think most things will work. We do need to be cautious about the RDA files, which will be full of invalid symbols. |
That's a good point. Note that this pull request does not force symbols to be valid Julia identifiers. For example, you can do
Do you know if Julia has a name-cleansing function? |
I don't believe that Julia has a name-cleansing function. We have a simple one in |
Wrote some code to check and standardize identifiers: https://gist.github.com/johnmyleswhite/8681455 |
This all looks good to me. If you're comfortable with it, I'm good to go. |
I guess the main reason for this change is to improve performance, since symbols are more efficient to handle than strings? Just to be sure I understand. This indeed sounds like a good idea. |
John, I'm good with it. @nalimilan, symbols have the following advantages:
Strings have some advantages:
|
@tshort Thanks for the explanation! |
So, I'd like to merge this as soon as possible. But it would be good to let @garborg see this change first, so that he can handle the way it interacts with his changes to |
For future reference, the thing I like about symbols is that they make it harder to generate column names at the REPL which aren't valid Julia identifiers. This will help cut down on the number of cases in which somebody creates a column name that they subsequently find it to hard to work with. Also, we currently sometimes accepted symbols and sometimes accepted strings as column names in a few functions. Forcing more consistency will make things easier. |
I don't see any issues with rebasing the readtable PR. |
Ok. Then, @tshort, pull the switch whenever you're ready. Let the breakage begin! |
@johnmyleswhite, I don't think I can pull the switch to commit. Either I don't have permissions, or Github has changed, and I can't find the right button to push. |
You have permissions now. |
Convert to using only symbols for column names.
Question: should we not show the |
I was wondering that, too. I left it as is, thinking that it might On Wed, Jan 29, 2014 at 12:51 PM, John Myles White <[email protected]
|
Since we were showing strings without any quotes, I'd be onboard with dropping the initial colon. But let's hear what a few other folks think before making a decision. FYI: I really like working with symbols now that I'm running this patch. No more pesky quote matching. 😄 |
Just to chime in. I like this change -- I think it'll be better for users. On Wed, Jan 29, 2014 at 2:10 PM, John Myles White
|
I'm onboard with metadata, but if we're going to add it we need to agree for sure on what we unambiguously intend to support. I'd really like us to put out a spec for DataFrames by March 1st that contains stuff we promise to support indefinitely into the future and code that passes tests based on that spec. That way we can direct users to 0.3 for a long time to come while we make changes without fear of screwing anybody over. |
Supporting column meta-data, in particular labels, is indeed the logical complement of restricting column names to valid symbol names. Regarding the printing, I also think it would be better to remove the |
Let's move discussion to #35. |
Convert to using only symbols for column names.
Obviously, this is a big change to the user interface.
I've only changed the tests and source code. If this is acceptable, I'll update the documentation.