Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure FilePaths work #511

Merged
merged 6 commits into from
Oct 19, 2019
Merged

Ensure FilePaths work #511

merged 6 commits into from
Oct 19, 2019

Conversation

morris25
Copy link
Contributor

@morris25 morris25 commented Oct 7, 2019

The CSV.write docstring claims that it should work with FIlePaths

Write a Tables.jl interface input (https://github.com/JuliaData/Tables.jl) to a csv file, given as an IO argument or String/FilePaths.jl type representing the file name to write to.

However, when I try to write to a Path I get

CSV.write(mktmpdir() / "test.txt", DataFrame(:a=>[1,2,3]))
ERROR: MethodError: no method matching with(::getfield(CSV, Symbol("##48#49")){Bool,Tables.Schema{(:a,),Tuple{Int64}},Tables.RowIterator{NamedTuple{(:a,),Tuple{Array{Int64,1}}}},CSV.Options{UInt8,UInt8,Nothing,Tuple{}},Tuple{Symbol},Int64,Int64,Array{UInt8,1}}, ::PosixPath, ::Bool)
Closest candidates are:
  with(::Function, ::Union{DevNull, Pipe, PipeEndpoint, TTY}, ::Any) at /Users/sam/.julia/dev/CSV/src/write.jl:136
  with(::Function, ::IO, ::Any) at /Users/sam/.julia/dev/CSV/src/write.jl:130
  with(::Function, ::String, ::Any) at /Users/sam/.julia/dev/CSV/src/write.jl:140
Stacktrace:
 [1] #write#47(::Bool, ::Bool, ::Array{String,1}, ::Function, ::Tables.Schema{(:a,),Tuple{Int64}}, ::Tables.RowIterator{NamedTuple{(:a,),Tuple{Array{Int64,1}}}}, ::PosixPath, ::CSV.Options{UInt8,UInt8,Nothing,Tuple{}}) at /Users/sam/.julia/dev/CSV/src/write.jl:73
 [2] write(::Tables.Schema{(:a,),Tuple{Int64}}, ::Tables.RowIterator{NamedTuple{(:a,),Tuple{Array{Int64,1}}}}, ::PosixPath, ::CSV.Options{UInt8,UInt8,Nothing,Tuple{}}) at /Users/sam/.julia/dev/CSV/src/write.jl:68
 [3] #write#46(::Char, ::Char, ::Nothing, ::Nothing, ::Char, ::Char, ::Char, ::Nothing, ::Bool, ::String, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::typeof(CSV.write), ::PosixPath, ::DataFrame) at /Users/sam/.julia/dev/CSV/src/write.jl:60
 [4] write(::PosixPath, ::DataFrame) at /Users/sam/.julia/dev/CSV/src/write.jl:53
 [5] top-level scope at none:0

These changes should ensure CSV read/write functions work for any AbstractPath.

Copy link
Collaborator

@nickrobinson251 nickrobinson251 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Project.toml Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Oct 7, 2019

Codecov Report

Merging #511 into master will decrease coverage by 0.93%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #511      +/-   ##
==========================================
- Coverage   84.52%   83.58%   -0.94%     
==========================================
  Files           7        7              
  Lines        1402     1395       -7     
==========================================
- Hits         1185     1166      -19     
- Misses        217      229      +12
Impacted Files Coverage Δ
src/CSV.jl 85.23% <ø> (ø) ⬆️
src/write.jl 85.14% <ø> (ø) ⬆️
src/utils.jl 74.43% <100%> (-1.57%) ⬇️
src/rows.jl 84.15% <0%> (-9.42%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 164888e...a024a72. Read the comment docs.

@mattBrzezinski
Copy link

Looks good to me, only nice to have would be a test case with your example from the CR description:

CSV.write(mktmpdir() / "test.txt", DataFrame(:a=>[1,2,3]))

src/utils.jl Outdated
@@ -174,6 +174,8 @@ function getsource(source, use_mmap)
return source
elseif source isa Cmd
return Base.read(source)
elseif source isa AbstractPath
return Base.read(Base.open(source))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I don't love this because we're not matching the behavior when source is a String, which I think should be consistent. When source is a String and we have use_mmap, then we Mmap.mmap, otherwise, we make our own mmapped vector and copy the file contents into it. I'd rather rearrange to ensure strings and abstractpath sources are treated the same.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that some FilePaths can be mmaps directly, in particular SystemPaths
other cannot, like AWS3.S3Paths
But I guess that can be fairly easily handled by treating SystemPaths like strings,
and all other AbstractPaths like Cmds etc.

rofinn/FilePathsBase.jl#51

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SystemPaths don't directly support mmap and have to be opened first. We're also stuck using open on non-System Paths since that guarantees us the right type coming out of read (S3Path right now returns a String).

Also why are we checking slurp(source isa IO ? source : open(String(source))) after the elseif !isa(source, IO) in this function?

@oxinabox
Copy link

oxinabox commented Oct 8, 2019

The previous PR to add FilePaths support was
#394

It probably should have added FilePaths as a Test dependendcy, so that we would have been better at not breaking it

src/utils.jl Outdated Show resolved Hide resolved
src/write.jl Outdated Show resolved Hide resolved
Manifest.toml Outdated Show resolved Hide resolved
@@ -197,4 +202,9 @@ using CSV, Dates, WeakRefStrings, CategoricalArrays, Tables
end

@test read(rd, String) == "col1,col2,col3\n1,4,7\n2,5,8\n3,6,9\n"

mktmpdir() do tmp
Copy link
Collaborator

@nickrobinson251 nickrobinson251 Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just as a heads up: this will be deprecated to mktempdir in a future FilePathsBase release (rofinn/FilePathsBase.jl@v0.6.2...master). Just mentioning in case we want to bug @rofinn to release that change and increase the lower-bound on FPB compat here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mktmdir() wasn't deprecated, it's just an alias for mktempdir(SystemPath) now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

src/utils.jl Outdated
elseif !isa(source, IO)
m = Mmap.mmap(source)
m = Mmap.mmap(source isa AbstractPath ? open(source) : source)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is effectively the same as

Suggested change
m = Mmap.mmap(source isa AbstractPath ? open(source) : source)
m = open(Mmap.mmap, source)

since all Mmap.mmap(source::String) does is basically open(Mmap.mmap, source) anyway

@oxinabox
Copy link

oxinabox commented Oct 16, 2019

My alternative dispatch based getsource:
First two cold be combined via a Union

getsource(source::Cmd, ::Any) = Base.read(open(source))
getsource(source::AbstractPath, ::Any) = Base.read(open(source))
getsource(source::Vector{UInt8}, ::Any) = source
getsource(source::IO, ::Any) = slurp(source)
getsource(source::SystemPath, use_mmap) = getsource(string(source), use_mmap)
function getsource(source, use_mmap)
    m = Mmap.mmap(source)
    use_mmap && return m
        
    m2 = Mmap.mmap(Vector{UInt8}, length(m))
    copyto!(m2, 1, m, 1, length(m))
    finalize(m)
    return m2
end

I think this is the same as, and clear than the chain of conditionals

@quinnj
Copy link
Member

quinnj commented Oct 16, 2019

Also, you'll probably need to rebase off current master head/tip to get windows CI fixes for travis.

@quinnj
Copy link
Member

quinnj commented Oct 18, 2019

Bump; is there anything else @morris25 want to do on this? Any concerns @mattBrzezinski or @oxinabox? If this can be rebased, then I think CI should be clean and I'm happy to merge and do a new release.

@morris25
Copy link
Contributor Author

That should be everything now! Sorry I didn't get back to this sooner

Copy link

@mattBrzezinski mattBrzezinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, ship it!

@@ -175,27 +175,25 @@ function slurp(source)
return final
end

getsource(source::Vector{UInt8}, ::Any) = source

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just typing this solution up as you pushed your commit.

Thank you for making this 100% cleaner and easier to read!! 🚀 💯

@quinnj quinnj merged commit 745bb25 into JuliaData:master Oct 19, 2019
@morris25 morris25 deleted the sm/filepaths branch October 19, 2019 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants