Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split() enhancement, keep_splitter #20625

Open
iagobaapellaniz opened this issue Feb 16, 2017 · 6 comments
Open

split() enhancement, keep_splitter #20625

iagobaapellaniz opened this issue Feb 16, 2017 · 6 comments
Labels
collections Data structures holding multiple items, e.g. sets design Design of APIs or of the language itself strings "Strings!"

Comments

@iagobaapellaniz
Copy link
Contributor

iagobaapellaniz commented Feb 16, 2017

Hello,

I would like to suggest a feature request, which I'm now working on in a new branch, for the split(str, splitter; limit=0, keep=true) function.

I found myself trying to use the function in that way, and what I expect somehow is the following,

julia> split("abcabcdabbcd", "b"; keep_splitter = true)
3-element Array{SubString{String},1}:
 "a"   
 "bca" 
 "bcda"
 "b"
 "bcd"

Should I keep on working? Would it be a breaking change for the rest of the ecosystem?

Some ideas:

  • If the keep_spliter flag is true, then it should not make any difference whether keep (which stands for empty results), is true or false, since, there wouldn't be empty results at all.
  • I'm not sure whether to include some other flag to include the splitter just to the next substring or to the previous one. This is how readlines(file) work, isn't it? In each element of the array we have '\n' at the end.
julia> split("abcabcdabbcd", "b"; keep_splitter = true, prepend = false)
3-element Array{SubString{String},1}:
 "ab"   
 "cab" 
 "cdab"
 "b"
 "cd"

Thanks in advance!

PS, maybe it can be done with another function I don't know yet

@ararslan ararslan added collections Data structures holding multiple items, e.g. sets design Design of APIs or of the language itself strings "Strings!" labels Feb 16, 2017
@nalimilan
Copy link
Member

Related to a possible splitlines function (#20390), and to the new chomp argument to readline/eachline (#19944, #19944).

Cc: @mpastell

@StefanKarpinski
Copy link
Sponsor Member

Note that another way of looking at the current keep keyword, what it really does is say that your splitter can be repeated 1 or more times, i.e. implicitly wrapping it in a (...)+ as a regex.

@wsphillips
Copy link

This also crops up when trying to split camel case strings:

foo = "ThisShouldBeSeparate"
split(foo, isuppercase)
5-element Array{SubString{String},1}:
 ""
 "his"
 "hould"
 "e"
 "eparate"

A solution is to use a regex delimiter: split(foo, r"(?=[A-Z])") but that's far less intuitive (and I wouldn't have solved it without outside help).

@StefanKarpinski
Copy link
Sponsor Member

I think this should be called keepdelim since the term we use in docs is generally "delimiter" rather than "splitter". My concern about this is that what the behavior should be is not entirely obvious to me: does it keep the delimiters as individual words? Or does it keep them as part of the preceding or following item? The example seems to keep the delimiter along with the following item. Why? Why not the preceding item? It seems more flexible to return items alternating with delimiters since then you can rejoin them either way, although that's inefficient since it creates a bunch of temporary string objects. Another option would be to have a splitindices function that turns the indices where splitting would be done and allow the caller to decide what parts they want to take.

@BenjaminGalliot
Copy link
Contributor

BenjaminGalliot commented Dec 18, 2020

@StefanKarpinski, I posted a topic yesterday and planned to create a new ticket, but this one can be merged too. I posted a message yesterday here then erased it (because it was not exactly the same goal), so perhaps that is why it bumped again!

In my case, it was to have it as a separate element, as in other languages shown there, but I think we can make this keepdelim argument something more useful than a simple boolean, like nothing (default), separate, previous, next, and even both (previous + next) and everywhere (both + separate), so maybe a free combination to allow all cases (next + separate, for example)...

@iagobaapellaniz
Copy link
Contributor Author

iagobaapellaniz commented Dec 19, 2020

I think keepdelim is by far better that keep_splitter for the arguments shown by Stefan. Nevertheless, I would expect a boolean there. On the other hand if one wants what Benjamin suggests, I would prefer something like delim = :nothing, delim = :separate, delim = :withnext, or something similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
collections Data structures holding multiple items, e.g. sets design Design of APIs or of the language itself strings "Strings!"
Projects
None yet
Development

No branches or pull requests

6 participants