-
Notifications
You must be signed in to change notification settings - Fork 496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deduplicate bug in Spark in case of a null column. #814
Comments
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Describe the bug
Same bug as #713 however for Spark
When there is a null column(Unrelated to the partition by or order by columns), Spark doesn't return the expected rows in the deduplicate default function.
Steps to reproduce
*** This is python code where the deduplication code was copied to in order to reproduce the bug.
Expected results
+----+----+----+
|col1|col2|col3|
+----+----+----+
| 1| 1| null |
+----+----+----+
Actual results
+----+----+----+
|col1|col2|col3|
+----+----+----+
+----+----+----+
Screenshots and log output
System information
The contents of your
packages.yml
file:Which database are you using dbt with?
The output of
dbt --version
:Additional context
Are you interested in contributing the fix?
The text was updated successfully, but these errors were encountered: