Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hcat fails if one column is only 'missing' values #548

Closed
quatrix opened this issue Dec 15, 2019 · 1 comment
Closed

hcat fails if one column is only 'missing' values #548

quatrix opened this issue Dec 15, 2019 · 1 comment

Comments

@quatrix
Copy link

quatrix commented Dec 15, 2019

I'm trying to load a CSV and hcat it with another DataFrame that has the same number of rows and unique columns.

using CSV
using DataFrames

df = DataFrame(a=1:3, b=[missing, missing, missing])
CSV.write("/tmp/a.csv", df)

df_csv = CSV.read("/tmp/a.csv")

hcat(df, DataFrame(c=1:3)) # works
hcat(df_csv, DataFrame(c=1:3)) # throws an exception

The first hcat works, the second throws this exception:

MethodError: no method matching iterate(::Nothing)
Closest candidates are:
  iterate(!Matched::Core.SimpleVector) at essentials.jl:600
  iterate(!Matched::Core.SimpleVector, !Matched::Any) at essentials.jl:600
  iterate(!Matched::ExponentialBackOff) at error.jl:218
  ...

Stacktrace:
 [1] iterate at ./iterators.jl:139 [inlined]
 [2] iterate at ./iterators.jl:138 [inlined]
 [3] foreach(::CSV.var"#43#46", ::Base.Iterators.Enumerate{Nothing}) at ./abstractarray.jl:1920
 [4] copy(::CSV.Column{Missing,Missing}) at /Users/quatrix/.julia/packages/CSV/2VBaR/src/tables.jl:107
 [5] (::DataFrames.var"#DataFrame#99#102")(::Bool, ::Type{DataFrame}, ::Array{AbstractArray{T,1} where T,1}, ::DataFrames.Index) at /Users/quatrix/.julia/packages/DataFrames/uPgZV/src/dataframe/dataframe.jl:126
 [6] Type at ./none:0 [inlined]
 [7] #select#130(::Bool, ::typeof(select), ::DataFrame, ::Base.OneTo{Int64}) at /Users/quatrix/.julia/packages/DataFrames/uPgZV/src/dataframe/dataframe.jl:875
 [8] #select#132 at ./none:0 [inlined]
 [9] #select at ./none:0 [inlined]
 [10] getindex at /Users/quatrix/.julia/packages/DataFrames/uPgZV/src/dataframe/dataframe.jl:395 [inlined]
 [11] #copy#125 at /Users/quatrix/.julia/packages/DataFrames/uPgZV/src/dataframe/dataframe.jl:698 [inlined]
 [12] #copy at ./none:0 [inlined]
 [13] #hcat#143 at /Users/quatrix/.julia/packages/DataFrames/uPgZV/src/dataframe/dataframe.jl:931 [inlined]
 [14] hcat(::DataFrame, ::DataFrame) at /Users/quatrix/.julia/packages/DataFrames/uPgZV/src/dataframe/dataframe.jl:931
 [15] top-level scope at In[6]:10

It seems to happen only when one of the columns is just 'missing' values.

workaround I found is to add a unique id to each row and then join them

another_df = DataFrame(c=1:3)

df_csv._id = 1:size(df_csv,1)
another_df._id = 1:size(another_df, 1)

result = join(df_csv, another_df, on=:_id)

# and then to remove _id column
@quinnj
Copy link
Member

quinnj commented Dec 16, 2019

Thanks for the report; I have a fix in 1e10c9c

@quinnj quinnj closed this as completed in fe0780e Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants