Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add like sub-string filter #1091

Merged
merged 6 commits into from
Feb 15, 2023

Conversation

fredcarle
Copy link
Collaborator

Relevant issue(s)

Resolves #982

Description

This PR adds _like and _nlike filters to match on a sub-string basis. The % character can be used before and/or after the string to specify how the string should be matched.

  • "String to match" will try to match the string exactly. It is equivalent to using the _eq filter.
  • "%String to match" will match to a string ending with String to match.
  • "String to match%" will match to a string starting with String to match.
  • "%String to match%" will match to a string that contains String to match.

Tasks

  • I made sure the code is well commented, particularly hard-to-understand areas.
  • I made sure the repository-held documentation is changed accordingly.
  • I made sure the pull request title adheres to the conventional commit style (the subset used in the project can be found in tools/configs/chglog/config.yml).
  • I made sure to discuss its limitations such as threats to validity, vulnerability to mistake and misuse, robustness to invalidation of assumptions, resource requirements, ...

unit and integration tests

(replace) Describe the tests performed to verify the changes. Provide instructions to reproduce them.

Specify the platform(s) on which this was tested:

  • MacOS

@fredcarle fredcarle added feature New feature or request area/query Related to the query component action/no-benchmark Skips the action that runs the benchmark. labels Feb 12, 2023
@fredcarle fredcarle added this to the DefraDB v0.5 milestone Feb 12, 2023
@fredcarle fredcarle requested a review from a team February 12, 2023 00:05
@fredcarle fredcarle self-assigned this Feb 12, 2023
@codecov
Copy link

codecov bot commented Feb 12, 2023

Codecov Report

Merging #1091 (cc9191c) into develop (467cbe8) will increase coverage by 0.11%.
The diff coverage is 81.81%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1091      +/-   ##
===========================================
+ Coverage    67.39%   67.51%   +0.11%     
===========================================
  Files          178      180       +2     
  Lines        16635    16690      +55     
===========================================
+ Hits         11211    11268      +57     
+ Misses        4476     4475       -1     
+ Partials       948      947       -1     
Impacted Files Coverage Δ
connor/nlike.go 57.14% <57.14%> (ø)
connor/like.go 84.09% <84.09%> (ø)
connor/connor.go 93.33% <100.00%> (+1.02%) ⬆️
net/peer.go 47.87% <0.00%> (+1.82%) ⬆️
net/client.go 85.71% <0.00%> (+7.14%) ⬆️

Copy link
Member

@shahzadlone shahzadlone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this clean implementation, was a nice review.

The "like" matching pattern I am assuming was motivated by SQL string matching operator that works with WHERE? In that case is there any planned future work to fully support all the feature set of the like operator or do we want to do it within this PR?

The like operator has these functionalities:

  1. str% = match any value that starts with "str".
  2. %str = match any value that ends with "str".
  3. %str% = match any value that has "str" in any position.
  4. s%tr = match any value that starts with "s" and ends with "tr".
  5. _s% = match any value that have "s" in the second position.
  6. s_% = match any value that starts "s" and is at least 2 characters in length.
  7. s__% = match any value that starts "s" and is at least 3 characters in length (and so on).

This PR implements [1-3] only, IMO (4) might be worth having too.

IMO we should in future still implement regex search similar to how mongodb does it, as % is essentially just the .* pattern and _ is the ? pattern. https://www.mongodb.com/docs/manual/reference/operator/query/regex/

I would change the PR title to something indicating that it is a like substring filtering.
i.e. feat: Add ability to filter sub-strings using 'like'

connor/like.go Outdated Show resolved Hide resolved
connor/like.go Outdated Show resolved Hide resolved
connor/like.go Outdated Show resolved Hide resolved
connor/like.go Show resolved Hide resolved
connor/nlike.go Outdated Show resolved Hide resolved
@AndrewSisley
Copy link
Contributor

IMO we should in future still implement regex search similar to how mongodb does it

I thought about this for a little bit. I think I prefer to have a simple like/nlike as we have in this PR (without worrying about _s etc), and a regex operator at a later point for those who want. like/nlike can remain super simple for those who just need super simple, and then regex for everything else.

I still am not sure how I feel about nlike lol - it feels odd, and I dislike extra operators, but we have no NOT keyword... Did a partner ask for nlike @fredcarle? What prompted you to add it?

Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prod code looks good, would like to wait and see what people think RE the feature itself though before merge, and maybe an extra test or two on nil props

connor/like.go Outdated Show resolved Hide resolved
@fredcarle fredcarle force-pushed the fredcarle/feat/I982-subtring-filter branch from c6bcca3 to fc5f4f6 Compare February 14, 2023 04:47
@fredcarle
Copy link
Collaborator Author

@AndrewSisley

I still am not sure how I feel about nlike lol - it feels odd, and I dislike extra operators, but we have no NOT keyword... Did a partner ask for nlike @fredcarle? What prompted you to add it?

No one specifically asked for it but I've often been in situations where I needed to find items that did not include a certain string. For example at my previous job I had to look for wireless product that had 5G in the name but not LTE.

Also, pretty much all our other operators have the n prefix as an not version of the operator.

@shahzadlone
I think I'd rather keep it simple for the like filter. I Don't mind adding the s%tr case though. And for the rest I'd tend to delegate that to a regex operator.

@fredcarle fredcarle changed the title feat: Add sub-string filter feat: Add like sub-string filter Feb 14, 2023
@shahzadlone
Copy link
Member

@shahzadlone I think I'd rather keep it simple for the like filter. I Don't mind adding the s%tr case though. And for the rest I'd tend to delegate that to a regex operator.

I think that is a decent middle ground. I'd be happy with the implementation of (4):
s%tr = match any value that starts with "s" and ends with "tr".

=D

Copy link
Contributor

@AndrewSisley AndrewSisley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - thanks for expanding the tests (and for covering so much prior to the extra request)

@fredcarle fredcarle force-pushed the fredcarle/feat/I982-subtring-filter branch from 9de3cf0 to 2db9a9a Compare February 14, 2023 21:59
@shahzadlone
Copy link
Member

shahzadlone commented Feb 15, 2023

I think once you implement (4) it's implementation might be a superset and handle (1), (2), (3) as well.

Giving approval, it's okay if you want to do that in a separate pr, as the current code / features work.

@fredcarle
Copy link
Collaborator Author

I think once you implement (4) it's implementation might be a superset and handle (1), (2), (3) as well.

Giving approval, it's okay if you want to do that in a separate pr, as the current code / features work.

It's in this PR :)

@fredcarle fredcarle force-pushed the fredcarle/feat/I982-subtring-filter branch from 2db9a9a to cc9191c Compare February 15, 2023 15:47
@fredcarle fredcarle merged commit 563cb7d into develop Feb 15, 2023
@fredcarle fredcarle deleted the fredcarle/feat/I982-subtring-filter branch February 15, 2023 16:05
shahzadlone pushed a commit that referenced this pull request Apr 13, 2023
Relevant issue(s)
Resolves #982

Description
This PR adds _like and _nlike filters to match on a sub-string basis. The % character can be used before and/or after the string to specify how the string should be matched.

"String to match" will try to match the string exactly. It is equivalent to using the _eq filter.
"%String to match" will match to a string ending with String to match.
"String to match%" will match to a string starting with String to match.
"%String to match%" will match to a string that contains String to match.
shahzadlone pushed a commit to shahzadlone/defradb that referenced this pull request Feb 23, 2024
Relevant issue(s)
Resolves sourcenetwork#982

Description
This PR adds _like and _nlike filters to match on a sub-string basis. The % character can be used before and/or after the string to specify how the string should be matched.

"String to match" will try to match the string exactly. It is equivalent to using the _eq filter.
"%String to match" will match to a string ending with String to match.
"String to match%" will match to a string starting with String to match.
"%String to match%" will match to a string that contains String to match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
action/no-benchmark Skips the action that runs the benchmark. area/query Related to the query component feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for sub-string matching (e.g. _like)
3 participants