Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior during vector assignment: df[0] = df[0] + 10 #496

Open
gioele opened this issue Mar 3, 2019 · 2 comments
Open

Unexpected behavior during vector assignment: df[0] = df[0] + 10 #496

gioele opened this issue Mar 3, 2019 · 2 comments

Comments

@gioele
Copy link

gioele commented Mar 3, 2019

DataFrame assignments behave in a surprising way then numerical indexes are used: instead of the vector replacing the old one, they are added to the dataframe. This is in contrast with what happen when explicit names are used.

Take for example this dataframe:

df = Daru::DataFrame.new({ :a => [1,2,3,4], :b => [5,6,7,8] })
=> #<Daru::DataFrame(4x2)>
       a   b
   0   1   5
   1   2   6
   2   3   7
   3   4   8

Assigning df[0] will add a new vector to the dataframe instead of replacing the 0th column:

df[0] = df[0] + 10; df
=> #<Daru::DataFrame(4x3)>
       a   b   0
   0   1   5  11
   1   2   6  12
   2   3   7  13
   3   4   8  14

This is surprising, considered that assigning to df[:a] replaces the :a column as expected:

 df[:a] = df[:a] + 10; df
=> #<Daru::DataFrame(4x2)>
       a   b
   0  11   5
   1  12   6
   2  13   7
   3  14   8

and that df[:a] and df[0] both return the same vector

df[:a]
=> #<Daru::Vector(4)>
       a
   0  1
   1  2
   2  3
   3  4
df[0]
=> #<Daru::Vector(4)>
       a
   0  1
   1  2
   2  3
   3  4
@kojix2
Copy link
Member

kojix2 commented Mar 3, 2019

Hello. I am a Daru beginner too.
Therefore, perhaps I may be wrong, but I will reply.

I used Daru enthusiastically for the past two weeks,
I noticed that Daru has two different principles from Pandas.

  1. Vectors (columns) should always take priority over rows.
  2. You should call the vector/row by name or index rather than number.

Daru is not a matrix calculation library. Dataframes focuses on manipulating the series by name.
The importance of naming is a part of Ruby's culture.

You should access vectors by index name.

df["name_of_vector"]

or column number

df.at(1)

Row
You can access rows by index name.

df.row["name_of_row"]

or row number

df.row_at(1)

Somehow df[column_number] df.row[row_number] work too.
But they are not recommend way. #[](*names) method is for names. Not for column number.

This may be the reasons for this strange behavior you wrote.

@kojix2
Copy link
Member

kojix2 commented Mar 3, 2019

df = Daru::DataFrame.new({ :a => [1,2,3,4], :b => [5,6,7,8] })

df.set_at [0], (df.at(0) + 10)
df

Probably this is the correct way of writing, but I feel like being told that "Do not call Vector by the number of columns"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants