Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CubeStore: wrong query result when filtering two left-joined tables #3777

Closed
antoninkrotky opened this issue Dec 6, 2021 · 4 comments
Closed
Assignees
Labels
bug Something isn't working cube store Issues relating to Cube Store

Comments

@antoninkrotky
Copy link

Describe the bug
CubeStore returns wrong result set when filtering on two left-joined tables.

To Reproduce

create table dev_pre_aggregations.employee (name varchar);
insert into dev_pre_aggregations.employee (name) values ('John');
insert into dev_pre_aggregations.employee (name) values ('Jim');

create table dev_pre_aggregations.employee_department_bridge (employee_name varchar, department_name varchar);
insert into dev_pre_aggregations.employee_department_bridge (employee_name,department_name) values ('John','Marketing');
insert into dev_pre_aggregations.employee_department_bridge (employee_name,department_name) values ('Jim','Marketing');

select * 
from dev_pre_aggregations.employee AS e 
LEFT JOIN dev_pre_aggregations.employee_department_bridge b on (b.employee_name=e.name)
where b.department_name='Non existing';

--Empty result set expected, but we get:
+------+---------------+-----------------+
| name | employee_name | department_name |
+------+---------------+-----------------+
| Jim  | NULL          | NULL            |
| John | NULL          | NULL            |
+------+---------------+-----------------+
2 rows in set (0.00 sec)

The example above is enough to reproduce this bug. To give a full picture - we found this bug when joining 3 tables but then were able to simplify the scenario. Add this code to see our original scenario:

create table dev_pre_aggregations.department (name varchar);
insert into dev_pre_aggregations.department (name) values ('Marketing');
insert into dev_pre_aggregations.department (name) values ('Finance');
insert into dev_pre_aggregations.department (name) values ('IT');

select * from
    dev_pre_aggregations.employee AS e
        LEFT JOIN dev_pre_aggregations.employee_department_bridge b on (b.employee_name=e.name)
        LEFT JOIN dev_pre_aggregations.department d on (b.department_name=d.name) where d.name = 'Non existing';

--Empty result set expected, but we get:
+------+---------------+-----------------+------+
| name | employee_name | department_name | name |
+------+---------------+-----------------+------+
| Jim  | NULL          | NULL            | NULL |
| John | NULL          | NULL            | NULL |
+------+---------------+-----------------+------+

Expected behavior
0 rows returned

Screenshots
None

Version:
v0.28.64

Additional context
Discussed in Slack https://cube-js.slack.com/archives/CC0403RRR/p1638799039492100
Confirmed there by @igorlukanin

@igorlukanin igorlukanin added the cube store Issues relating to Cube Store label Dec 7, 2021
@kaja78
Copy link

kaja78 commented Dec 8, 2021

See this comment in arrow-datafusion: apache/datafusion#1339 (comment)

@paveltiunov
Copy link
Member

Hey @kaja78 ! Thanks for posting this comment! This is helpful. Yeah. It can be related to this optimization.

@mjstephenson72
Copy link

It appears that the underlying Arrow issue has been resolved ... are there plans to update the packaged arrow codebase in cube to include this fix? apache/datafusion#1618

@javiramos1
Copy link

@paveltiunov has this been incorporated into the cube js fork?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cube store Issues relating to Cube Store
Projects
None yet
Development

No branches or pull requests

6 participants