Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Join errors when join keys have the same name #2649

Closed
kevinzwang opened this issue Aug 13, 2024 · 1 comment
Closed

Join errors when join keys have the same name #2649

kevinzwang opened this issue Aug 13, 2024 · 1 comment
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@kevinzwang
Copy link
Member

kevinzwang commented Aug 13, 2024

Describe the bug
When multiple join keys have the same name on the same side, the join errors with

DaftCoreException: DaftError::External Unable to create logical plan node.
Due to: DaftError::ValueError Attempting to make a Schema with duplicate field names: duplicated_name

To Reproduce
Steps to reproduce the behavior:

import daft
from daft import col

df1 = daft.from_pydict({
	"a": [1, 2],
	"b": [2, 2]
})

df2 = daft.from_pydict({
	"a": [1, 2],
})

# does not work
df1.join(df2, left_on=["a", "b"], right_on=["a", "a"])

# works
df1.join(df2, left_on=["a", "b"], right_on=["a", col("a").alias("not_a")])

Expected behavior
Both joins above should result in the dataframe without error.

@kevinzwang kevinzwang added bug Something isn't working good first issue Good for newcomers labels Aug 13, 2024
@kevinzwang
Copy link
Member Author

kevinzwang commented Aug 13, 2024

May be a good first issue for someone who wants to learn more about how we do joins.

anmolsingh20 pushed a commit to anmolsingh20/Daft that referenced this issue Sep 21, 2024
…#2649)

The issue fixed here had a workaround previously - aliasing the
duplicate column name. This is not needed anymore as the aliasing
is performed under the hood, taking care of uniqueness of individual
column keys to avoid the duplicate issue.
anmolsingh20 pushed a commit to anmolsingh20/Daft that referenced this issue Sep 21, 2024
…#2649)

The issue fixed here had a workaround previously - aliasing the
duplicate column name. This is not needed anymore as the aliasing
is performed under the hood, taking care of uniqueness of individual
column keys to avoid the duplicate issue.
anmolsingh20 pushed a commit to anmolsingh20/Daft that referenced this issue Sep 21, 2024
…#2649)

The issue fixed here had a workaround previously - aliasing the
duplicate column name. This is not needed anymore as the aliasing
is performed under the hood, taking care of uniqueness of individual
column keys to avoid the duplicate issue.
anmolsingh20 pushed a commit to anmolsingh20/Daft that referenced this issue Sep 21, 2024
…#2649)

The issue fixed here had a workaround previously - aliasing the
duplicate column name. This is not needed anymore as the aliasing
is performed under the hood, taking care of uniqueness of individual
column keys to avoid the duplicate issue.
anmolsingh20 pushed a commit to anmolsingh20/Daft that referenced this issue Sep 26, 2024
…#2649)

The issue fixed here had a workaround previously - aliasing the
duplicate column name. This is not needed anymore as the aliasing
is performed under the hood, taking care of uniqueness of individual
column keys to avoid the duplicate issue.
anmolsingh20 pushed a commit to anmolsingh20/Daft that referenced this issue Sep 26, 2024
…#2649)

The issue fixed here had a workaround previously - aliasing the
duplicate column name. This is not needed anymore as the aliasing
is performed under the hood, taking care of uniqueness of individual
column keys to avoid the duplicate issue.
anmolsingh20 pushed a commit to anmolsingh20/Daft that referenced this issue Sep 27, 2024
…#2649)

The issue fixed here had a workaround previously - aliasing the
duplicate column name. This is not needed anymore as the aliasing
is performed under the hood, taking care of uniqueness of individual
column keys to avoid the duplicate issue.
anmolsingh20 pushed a commit to anmolsingh20/Daft that referenced this issue Sep 27, 2024
…#2649)

The issue fixed here had a workaround previously - aliasing the
duplicate column name. This is not needed anymore as the aliasing
is performed under the hood, taking care of uniqueness of individual
column keys to avoid the duplicate issue.
anmolsingh20 pushed a commit to anmolsingh20/Daft that referenced this issue Oct 2, 2024
…#2649)

Rename join keys only for column expressions; include original
expression name in the renamed expression.
sagiahrac pushed a commit to sagiahrac/Daft that referenced this issue Oct 7, 2024
…#2649) (Eventual-Inc#2877)

The issue fixed here had a workaround previously - aliasing the
duplicate column name. This is not needed anymore as the aliasing is
performed under the hood, taking care of uniqueness of individual column
keys to avoid the duplicate issue.

---------

Co-authored-by: AnmolS <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

1 participant