Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concat on columns fails with multi-index when index names are not overlapping #2480

Open
eavidan opened this issue Nov 26, 2020 · 3 comments
Labels
bug 🦗 Something isn't working External Pull requests and issues from people who do not regularly contribute to modin P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas

Comments

@eavidan
Copy link
Collaborator

eavidan commented Nov 26, 2020

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.15.7
  • Modin version (modin.__version__): 0.8.2
  • Python version: 3.7
  • Code we can use to reproduce: yes

Describe the problem

Using concat columns on 2 data frames with multi-index and no overlapping index names fails with ValueError: cannot join with no overlapping index names. Same operation works using pandas

Source code / logs

The following works with pandas but fails with modin

df1 = pd.DataFrame({('s1', 426, 6, -3): 210}, index=[["a"], ["b"], ["c"], ["d"]])
df2 = pd.DataFrame({('s2', 194, -3, -5): None}, index=[["x"], ["b"], ["c"], ["d"]])
df2.index.names = ['x', 'y', 'z', 'w']
merged = pd.concat([df1, df2], axis=1)
@eavidan eavidan added bug 🦗 Something isn't working pandas concordance 🐼 Functionality that does not match pandas labels Nov 26, 2020
@YarShev
Copy link
Collaborator

YarShev commented Nov 27, 2020

Hi @eavidan , thanks for posting! @dchigarev , can you look at this? The traceback seems to be similar to that what are you trying to resolve in #2443 .

@dchigarev
Copy link
Collaborator

dchigarev commented Nov 27, 2020

@YarShev
This issue is different from #2443 and #2378, although it has the same root cause. The problem is in our modin_frame._concat function and the way we're using it.

Modins frame 'concat' is not a simple 'concat' like in pandas, it combines functionality of inserting, concating, joining, merging, and broadcasting for binary operations. And the problem is, that it tries to apply the logic of "joining" that will fit for all that cases, which seems to be not very good approach (going by all issues related with _concat and _copartition that was recently found).

Merge, join, concat, insert, binary_ops — they all have different logic of "joining" axes in pandas, modin tries to use common approach.

#2443 and #2378 tries to solve that issue for insert and binary_ops. It seems that concat will be the next in that row.

P.S.
How our modin_frame._concat differs from pandas.concat:
Pandas depending on joining type (inner or outer) does either Index.intersection or Index.union. Call-stack would be:
concat -> _Concatenator._get_new_axes -> _Concatenator._get_comb_axis -> Indexes.api.get_objs_combined_axis -> Indexes.api._get_combined_index

Modin depending on joining type does Index.join(how="inner") or Index.join(how="outer"), which as we can see works differently from pandas in reported case.

@pyrito
Copy link
Collaborator

pyrito commented Aug 22, 2022

I am able to reproduce this bug on the latest master.

@pyrito pyrito added the P2 Minor bugs or low-priority feature requests label Aug 22, 2022
@anmyachev anmyachev added the External Pull requests and issues from people who do not regularly contribute to modin label Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working External Pull requests and issues from people who do not regularly contribute to modin P2 Minor bugs or low-priority feature requests pandas concordance 🐼 Functionality that does not match pandas
Projects
None yet
Development

No branches or pull requests

5 participants