Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FR: split %like% into %like% and %rlike% #3333

Closed
MichaelChirico opened this issue Jan 30, 2019 · 13 comments · Fixed by #3552
Closed

FR: split %like% into %like% and %rlike% #3333

MichaelChirico opened this issue Jan 30, 2019 · 13 comments · Fixed by #3552
Milestone

Comments

@MichaelChirico
Copy link
Member

This would be mimicking the syntax of Spark SQL (like/rlike) which I quite like.

%like% would begin to use fixed = TRUE (potentially a breaking change) and %rlike% would be more like the current %like%. The idea is to provide a fixed = TRUE option since this is more efficient.

Do a lot of people rely currently on %like% accepting regex?

Would also be possible to go the opposite way by offering e.g. %flike% as the fixed version of %like%, at the expense of being somewhat confusing for frequent users of Spark SQL & data.table (such as myself)

@jangorecki
Copy link
Member

jangorecki commented Jan 30, 2019

should be resolved together with #2519
and how %like% behave in postgres? is there %rlike% in postgres also?

@jangorecki jangorecki added this to the 1.12.4 milestone Jan 30, 2019
@MichaelChirico
Copy link
Member Author

MichaelChirico commented Jan 30, 2019

https://www.postgresql.org/docs/9.3/functions-matching.html

Not too familiar with postgres myself, but I see SIMILAR TO and some operators like ~ that seem to serve those purposes

@jangorecki
Copy link
Member

I suggests to not look for pg older than 9.3. I believe postgres is more kind of standard than spark. Anyway best to add features to like() function and then just redirect different operators ilike, rlike, flike, etc. to like(...)..

@MichaelChirico
Copy link
Member Author

Agreed... though to be fair, the fully robust version of %like% is just grep after all 😄

@andschar
Copy link

andschar commented Feb 1, 2019

I guess breaking %like% would be annoying for a lot of people.

@HughParsonage
Copy link
Member

HughParsonage commented Feb 2, 2019

No objection to offering variants of %like% but strongly opposed to changing the behaviour of %like% for a pretty small benefit. Not only would it be a breaking change, it would be a silently breaking change: DT[x %like% "1.0"] would suddenly return a much smaller subset of DT than it did.

@andschar
Copy link

andschar commented Feb 4, 2019

I'm about to create a PR on this matter. I have on question: Is it better to create one function with varying operators or several functions? My idea was to change the function as such:

like <- function(vector, pattern, ignore.case = FALSE, fixed = FALSE)
{
  # Intended for use with a data.table 'where'
  # Don't use * or % like SQL's like.  Uses regexpr syntax - more powerful.
  if (is.factor(vector)) {
    as.integer(vector) %in% grep(pattern, levels(vector), ignore.case = ignore.case, fixed = fixed)
  } else {
    # most usually character, but integer and numerics will be silently coerced by grepl
    grepl(pattern, vector, ignore.case = ignore.case, fixed = fixed)
  }
  # returns 'logical' so can be combined with other where clauses.
}

"%like%" = like
"%ilike%" = function(vector, pattern, ignore.case) like(vector, pattern, ignore.case = TRUE)
"%flike%" = function(vector, pattern, fixed) like(vector, pattern, fixed = TRUE)

However, the usage of ilike() or flike() as functions would then be not possible. Right?

@jangorecki
Copy link
Member

it will be possible as calling "%ilike%"(), I don't have any specific preference about that but we can always define those operators and functions at once

"%ilike%" = ilike =function(...

@MichaelChirico
Copy link
Member Author

@andreasLD what is the difference between proposed like function and base::grep?

@arunsrinivasan
Copy link
Member

Gain seems to be very minimal for everyone to sift through their code and fix this when broken.. Why not add a %flike% or %fixlike% instead?

@MichaelChirico
Copy link
Member Author

I like %flike% and %ilike% (especially since they're mutually exclusive -- fixed = TRUE overrides ignore.case = TRUE in grep), and agree with @HughParsonage re:breaking behavior (time machine please 😬)...

But still don't see the point of a like function (AFAICT it's just grep)

@jangorecki
Copy link
Member

At least it feels more natural for SQL users

@MichaelChirico
Copy link
Member Author

Oh... just noticed we have like internally already. So the above is just adding to it 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants