-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request - Normalize Names #272
Comments
Alright, here is how I think we should do this: First, we add a function like With that function alone, one should already be able to normalize names a la And then we simply add a new macro to here that just translates I think you would start with the first part, get that all sorted out, and then in a second step we can add the macro. If you haven't written a generated function before, let me know, and I'll give more tips on how to do that. You can also take a look at some of the other functions in that file for patterns. |
I took a look at the NamedTupleUtilities module and there is quite a bit of syntax that looks foreign to me 😨. I have a lot of researching/learning to do before I'll be able to write a function that can accomplish the task at hand. Also, I checked out the CSV.jl source code and there is quite a bit going on there that I also don't understand (it's much more than simply replacing spaces, they are replacing other kinds of characters as well). I didn't get far at all before getting completely stuck 😳: function normalize_names(a::NamedTuple)
normalize(name) = replace(strip(string(name)), " " => "_")
names_normalized = normalize.(keys(a))
return names_normalized
end
mynames = (Symbol(" queryverse rocks"), :b, :c)
myvalues = [1, 2.0, "hello world"]
my_namedtuple = NamedTuple{mynames}(myvalues)
normalize_names(my_namedtuple)
Output:
("queryverse_rocks", "b", "c") This function works, but it obviously doesn't do anything close to what we want it to : ) I haven't written generated functions before and I actually haven't written any code that deals with NamedTuples. In looking at the @generated function select(a::NamedTuple{an}, ::Val{bn}) where {an, bn}
names = ((i for i in an if i == bn)...,)
types = Tuple{(fieldtype(a, n) for n in names)...}
vals = Expr[:(getfield(a, $(QuoteNode(n)))) for n in names]
return :(NamedTuple{$names,$types}(($(vals...),)))
end I don't really understand what My (mis)interpretation of the The Lastly, the function returns a NamedTuple of just the names/values that the user wanted, as specified by ::Val{bn}. Unfortunately, it all appears very cryptic to a novice like myself!! I'll keep researching and tinkering with my code to see how much closer I can get to the desired outcome. |
Yeah, this stuff is not easy :) The following code might be a useful template: function normalize_name(x)
return uppercase(x)
end
@generated function normalize_names(x::NamedTuple{NAMES,TYPES}) where {NAMES, TYPES}
new_names = (Symbol(normalize_name(string(i))) for i in NAMES)
return :(NamedTuple{$(tuple(new_names...))}(values(x)))
end The If CSV does a lot more in terms of normalizing the actual name than just replacing spaces, maybe we can just copy the code from there? I can also handle the macro, that is more boiler plate code that is tricky to get right if you are unfamiliar with it, but should be really easy for me. But if you can write the actual normalization code (and tests and docs etc.) that is already a huge help! |
Thanks, David! The below seems to be working fine for me (I copied the code from CSV.jl and merged it with your example above): using Unicode
const RESERVED = Set(["local", "global", "export", "let",
"for", "struct", "while", "const", "continue", "import",
"function", "if", "else", "try", "begin", "break", "catch",
"return", "using", "baremodule", "macro", "finally",
"module", "elseif", "end", "quote", "do"])
function normalize_name(name::String)::Symbol
uname = strip(Unicode.normalize(name))
id = Base.isidentifier(uname) ? uname : map(c->Base.is_id_char(c) ? c : '_', uname)
cleansed = string((isempty(id) || !Base.is_id_start_char(id[1]) || id in RESERVED) ? "_" : "", id)
return Symbol(replace(cleansed, r"(_)\1+"=>"_"))
end
@generated function normalize_names(x::NamedTuple{NAMES,TYPES}) where {NAMES, TYPES}
new_names = (Symbol(normalize_name(string(i))) for i in NAMES)
return :(NamedTuple{$(tuple(new_names...))}(values(x)))
end With this, I can do the following: mynames = (Symbol(" queryverse rocks"), :b, :c) # note that " queryverse rocks" is not normalized
myvalues = [1, 2.0, "hello world"]
my_namedtuple = NamedTuple{mynames}(myvalues)
normalize_names(my_namedtuple)
Output:
(queryverse_rocks = 1, b = 2.0, c = "hello world") # yay! it's normalized! I will attempt to write a test for the above and then get back to you in a couple of days. I can definitely handle writing the documentation so I will also take care of that and get it to you in the next couple of days. |
Cool! I think we’ll need tests in QueryOperators, but no docs there. But then we’ll need docs in Query for the macro (and tests there as well). |
Here's an attempt at the documentation: The @normalize_names commandThe Exampleusing Query
names = (Symbol(" queryverse rocks"), Symbol("¡column #2!"), :c)
values = [1, 2.0, "hello world"]
source = NamedTuple{names}(values)
q = source |> @normalize_names() |> collect
println(q)
# output
(queryverse_rocks = 1, _column_2! = 2.0, c = "hello world") |
I'm not sure exactly how to go about the unit tests, but here's something that works: names = (Symbol(" queryverse rocks"), Symbol("¡column #2!"), :c)
values = [1, 2.0, "hello world"]
source = NamedTuple{names}(values)
@test QueryOperators.NamedTupleUtilities.normalize_names(source) == (queryverse_rocks = 1, _column_2! = 2.0, c = "hello world")
@inferred QueryOperators.NamedTupleUtilities.normalize_names(source) |
CSV.jl has a
normalizenames
argument that can be set totrue
when reading a file. This option replaces invalid identifier characters (spaces) with underscores. I think it would be nice to add this sort of a feature to Query, but I would take it a step further and remove all trailing/leading whitespaces (rather than replacing them with underscores). From the CSV.jl documentation:"When a column name is not a single atom Julia identifier, this is inconvenient, because
f.column one
is not valid, so I would have to manually callgetproperty(f, Symbol("column one")
"Julia's built-in
strip
andreplace
functions should do the job. I'd love to make an attempt to write this myself if you can provide a basic roadmap for me to get started.Thanks!!
The text was updated successfully, but these errors were encountered: