Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Add str.replace expression #2048

Merged
merged 9 commits into from
Apr 2, 2024
Merged

[FEAT] Add str.replace expression #2048

merged 9 commits into from
Apr 2, 2024

Conversation

colin-ho
Copy link
Contributor

@colin-ho colin-ho commented Mar 28, 2024

Closes #1932
Closes #1962

Replaces all occurrences of a pattern in a string column with a replacement string. The pattern can be a literal or a regex pattern.

@github-actions github-actions bot added documentation Improvements or additions to documentation enhancement New feature or request labels Mar 28, 2024
@colin-ho colin-ho marked this pull request as ready for review March 28, 2024 22:01
let pattern_arrow = pattern.as_arrow();
let replacement_arrow = replacement.as_arrow();

if self.is_empty() || pattern.is_empty() || replacement.is_empty() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be && instead?

If any are empty, then all should be empty -- otherwise it's a length and we should throw an error?

Copy link
Contributor Author

@colin-ho colin-ho Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a separate issue (#2058) to tackle this cuz we'll probably want to add test cases for this change (and for all the other kernels), which is out of scope for this PR

daft/expressions/expressions.py Outdated Show resolved Hide resolved
daft/expressions/expressions.py Outdated Show resolved Hide resolved
src/daft-core/src/array/ops/utf8.rs Outdated Show resolved Hide resolved
}
}
},
(self_len,1,1) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty crazy that we have so many match cases. I wonder if there's a better way to perform this.

Perhaps something like this at least?

self_iter, pattern_iter, replacement_iter = ...;
if regex {
    regex_replace(...)
} else {
    replace_on_literal(...)
}

@colin-ho colin-ho mentioned this pull request Apr 1, 2024
Copy link

codecov bot commented Apr 1, 2024

Codecov Report

Attention: Patch coverage is 90.90909% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 85.27%. Comparing base (bfc648d) to head (0368764).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2048      +/-   ##
==========================================
+ Coverage   85.23%   85.27%   +0.03%     
==========================================
  Files          68       68              
  Lines        7248     7259      +11     
==========================================
+ Hits         6178     6190      +12     
+ Misses       1070     1069       -1     
Files Coverage Δ
daft/expressions/expressions.py 91.87% <100.00%> (+0.06%) ⬆️
daft/series.py 92.88% <85.71%> (-0.11%) ⬇️

... and 1 file with indirect coverage changes

@colin-ho colin-ho merged commit 8a118a1 into main Apr 2, 2024
31 checks passed
@colin-ho colin-ho deleted the colin/replace branch April 2, 2024 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support regex expressions [EXPRESSIONS] .str.replace
2 participants