Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add splitlines #20390

Closed
wants to merge 1 commit into from
Closed

Add splitlines #20390

wants to merge 1 commit into from

Conversation

mpastell
Copy link
Contributor

@mpastell mpastell commented Feb 2, 2017

Adds splitlines as suggested in #19759

line before it is returned. When `chomp` is false, they are returned as part of the
line.
"""
splitlines(str::AbstractString; chomp = true) = readlines(IOBuffer(str), chomp = chomp)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a doctest here showing how it is used?

Copy link
Sponsor Member

@StefanKarpinski StefanKarpinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, aside from a doctest example being nice (especially since folks have been working so hard at adding doctests to so many things).

@stevengj
Copy link
Member

stevengj commented Feb 2, 2017

Or maybe just implement readlines(s::AbstractString) and eachline(s::AbstractString) methods?

@stevengj stevengj added the strings "Strings!" label Feb 2, 2017
@StefanKarpinski
Copy link
Sponsor Member

julia> readlines("/usr/share/dict/words")
235886-element Array{String,1}:
 "A"
 "a"
 "aa"
 "aal"
 "aalii"
 "aam"
 "Aani"
 "aardvark"
 "aardwolf"
 
 "zymurgy"
 "Zyrenian"
 "Zyrian"
 "Zyryan"
 "zythem"
 "Zythia"
 "zythum"
 "Zyzomys"
 "Zyzzogeton"

@stevengj
Copy link
Member

stevengj commented Feb 2, 2017

Oh, right, I forgot that you can pass the filename to readlines and eachline.

@stevengj
Copy link
Member

stevengj commented Feb 2, 2017

I'm not sure calling readlines on an IOBuffer is the right implementation here, though. The split function returns an array of substrings, so shouldn't splitlines do the same?

@stevengj
Copy link
Member

stevengj commented Feb 2, 2017

The split function takes a duck-typed splitter argument that just has to work with search. Maybe the right thing to do would be to define a new splitter type, something like:

immutable LinebreakSplitter; chomp::Bool; end
function search(s::AbstractString, splitter::LinebreakSplitter, i::Integer)
       newline = search(s, '\n', i)
       newline < i && return newline:newline
       splitter.chomp || return nextind(s, newline):newline
       prev = prevind(s, newline)
       return prev  i && s[prev] == '\r' ? (prev:newline) : (newline:newline)
end
splitlines(s::AbstractString; chomp::Bool=true) = split(s, LinebreakSplitter(chomp))

Not only does this re-use the split logic, but also this way the LinebreakSplitter could potentially be used with other functions (e.g. replace).

@mpastell
Copy link
Contributor Author

mpastell commented Feb 4, 2017

Do we actually need the chomp option?

splitlines(str; limit=0, keep=true) = split(str, r"\n|\r\n", limit=limit, keep=keep)

would be a simple implementation that is compatible with split.

@StefanKarpinski
Copy link
Sponsor Member

I agree that LinebreakSplitter seems overly complicated. @mpastell's definition of splitlines or something equivalent to it seems sufficient to me.

@stevengj
Copy link
Member

stevengj commented Feb 6, 2017

I'm not sure how important it is in practice, but Python's version has the equivalent of the chomp=false option. I guess this would matter if you wanted to write the data back out while preserving the line endings.

@stevengj
Copy link
Member

stevengj commented Feb 6, 2017

The LinebreakSplitter is only about 2x faster than the regex version in a simple test on join([randstring(50)*(rand(Bool)?"\n":"\r\n") for i = 1:10^4]), so I would only use it if we decide the chomp option is important (or if we decide a 2x speed boost is important here).

@brenhinkeller
Copy link
Sponsor Contributor

brenhinkeller commented Nov 18, 2022

So it looks like as of Julia 1.8.3 this is now provided by split:

julia> s = "this\nstring has several\nlines"
"this\nstring has several\nlines"

julia> split(s, '\n')
3-element Vector{SubString{String}}:
 "this"
 "string has several"
 "lines"

so I think we can close this, but if anyone disagrees (or wants to add an alias splitlines(s) = split(s, '\n') feel free to reopen or make a new PR!

@mlhetland
Copy link
Contributor

Does this mean split now handles errors with \r (which was the point of this issue and PR)? If not, maybe at least the issue should be kept open?

@brenhinkeller
Copy link
Sponsor Contributor

Ah I missed the point

@brenhinkeller brenhinkeller reopened this Nov 19, 2022
@brenhinkeller brenhinkeller added the feature Indicates new feature / enhancement requests label Nov 19, 2022
@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 31, 2024

Quite conflicted for such a small change. The implementation here is already mentioned now in the docstring for eachline though and seems trivial enough not to need this PR to add it

help?> eachline

  eachline(io::IO=stdin; keep::Bool=false)
  eachline(filename::AbstractString; keep::Bool=false)

...

  To iterate over each line of a String, eachline(IOBuffer(str)) can be used.

@vtjnash vtjnash closed this Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Indicates new feature / enhancement requests strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants