Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pushing down 'not equal' to predicate #1365

Merged
merged 2 commits into from
Jul 21, 2023
Merged

Pushing down 'not equal' to predicate #1365

merged 2 commits into from
Jul 21, 2023

Conversation

akuzin1
Copy link
Contributor

@akuzin1 akuzin1 commented Jul 7, 2023

Description of changes:
Modified statement that is responsible for adding IS NOT NULL clause to generate sql for JDBC connectors.
Prior to change, when evaluating a not equal to predicate it would short circuit and not push down all predicates to sql statement.
For Example:
SELECT * FROM item WHERE i_manufact_id <> 138
Would be translated to:
SELECT * FROM item WHERE (i_manufact_id IS NOT NULL)

This is the result of calling getSpan() on SortedRangeSet, which returns a Range that is a summary of the lowest lower bound and the highest upper bound that represents this ValueSet.

Therefore, the following ValueSet from statement with i_manufact_id <> 138 predicate:

        i_manufact_id=SortedRangeSet{
            type=Int(64, true), nullAllowed=false, lowIndexedRanges={
                Marker{
                   valueBlock=Int(64, true), nullValue=true, valueBlock=true, bound=ABOVE
                   }=Range{
                        low=Marker{
                            valueBlock=Int(64, true), nullValue=true, valueBlock=true, bound=ABOVE
                           }, 
                        high=Marker{
                            valueBlock=Int(64, true), nullValue=false, valueBlock=138, bound=BELOW
                        }
                   }, 
                Marker{
                   valueBlock=Int(64, true), nullValue=false, valueBlock=138, bound=ABOVE
                   }=Range{
                        low=Marker{
                            valueBlock=Int(64, true), nullValue=false, valueBlock=138, bound=ABOVE
                        }, 
                        high=Marker{
                            valueBlock=Int(64, true), nullValue=true, valueBlock=true, bound=BELOW
                        }
                   }
               }
           }
       }

and the following ValueSet from statement with i_manufact_id IS NOT NULL predicate:

i_manufact_id=SortedRangeSet{
            type=Int(64, true), nullAllowed=false, lowIndexedRanges={
                Marker{
                    valueBlock=Int(64, true), nullValue=true, valueBlock=true, bound=ABOVE
                }
                =Range{
                    low=Marker{
                        valueBlock=Int(64, true), nullValue=true, valueBlock=true, bound=ABOVE
                    }, 
                    high=Marker{
                        valueBlock=Int(64, true), nullValue=true, valueBlock=true, bound=BELOW
                    }
                }
           }
        }

evaluate to the same Range when called getSpan on and they exit at the same condition.

According to ANSI SQL standards, when assessing a predicate comparison involving a NULL value, it does not yield true or false outcomes and is consequently excluded from the result set. This means that if there is more than one comparison operator performed on a single column, it will automatically drop NULL values, so IS NOT NULL only needs to be pushed down if it that's the only one.

So for example, in the following statement:
SELECT * FROM item WHERE i_manufact_id <> 138 AND i_manufact_id IS NOT NULL
the latter part will be inherently be achieved by pushing down first part.

The change results in 'IS NOT NULL' predicate only being pushed down if there's one Range present in ValueSet.

Generated SQL After Change:
SELECT * FROM item WHERE ((i_manufact_id < 138) OR (i_manufact_id > 138))

Testing:
Because JdbcSplitQueryBuilder is an abstract class and can't be initialized, I added unit tests to the MySqlQueryStringBuilder to verify proper behavior for generated SQL string, when predicates != and IS NOT NULL are being used.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@burhan94
Copy link
Contributor

can we add some test cases for this pls?

Comment on lines 80 to 81
Map<String, ValueSet> constraintsMap = new HashMap<>();
constraintsMap.put("testCol2", SortedRangeSet.of(false, Range.all(allocator, org.apache.arrow.vector.types.Types.MinorType.INT.getType())));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, you can init and add the map values in one line using ImmutableMap.of(key, value, key2, value2, ...).

It's generally cleaner to avoid initing a collection and later adding or modifying it (called the init-then-modify antipattern). This here isn't maybe only a violation of it from a purists perspective, but you can imagine how it can lead to cleaner code. so this comment applies to both tests in this file

@akuzin1 akuzin1 merged commit c31a644 into master Jul 21, 2023
@akuzin1 akuzin1 deleted the JdbcNotEqualPredicate branch July 21, 2023 19:58
github-actions bot pushed a commit that referenced this pull request Jul 21, 2023
  - limit clause pushdown fix for unsupported data sources (#1373)
  - Pushing down 'not equal' to predicate (#1365)
  - Added ! as valid character in Secret, due to syntax of service linked secrets (#1383)
  - build(deps): bump aws-sdk.version from 1.12.504 to 1.12.507 (#1380)
  - build(deps): bump msal4j from 1.13.8 to 1.13.9
  - build(deps): bump aws-sdk.version from 1.12.504 to 1.12.506
  - build(deps): bump protobuf-java from 3.23.3 to 3.23.4
  - build(deps): bump google-cloud-resourcemanager from 1.22.0 to 1.23.0
  - build(deps): bump google-cloud-storage from 2.23.0 to 2.24.0
  - build(deps): bump jt400 from 11.2 to 20.0.0
  - build(deps): bump avro from 1.11.1 to 1.11.2
  - build(deps-dev): bump DynamoDBLocal from 1.22.0 to 2.0.0
  - build(deps): bump google-cloud-storage from 2.22.4 to 2.23.0
  - build(deps): bump hadoop-common from 3.3.5 to 3.3.6
  - build(deps): bump grpc-google-cloud-bigquerystorage-v1
  - build(deps): bump google-cloud-resourcemanager from 1.21.0 to 1.22.0
  - build(deps): bump aws-cdk.version from 1.203.0 to 1.204.0
  - build(deps): bump snowflake-jdbc from 3.13.32 to 3.13.33
  - build(deps): bump google-cloud-bigquery from 2.27.1 to 2.29.0
  - build(deps): bump aws-msk-iam-auth from 1.1.6 to 1.1.7
  - build(deps): bump guava from 32.0.1-jre to 32.1.1-jre
  - build(deps): bump jqwik from 1.7.3 to 1.7.4
  - build(deps): bump aws-sdk.version from 1.12.490 to 1.12.504
  - build(deps-dev): bump equalsverifier from 3.14.2 to 3.15
  - build(deps): bump redshift-jdbc42 from 2.1.0.16 to 2.1.0.17
  - build(deps): bump ngdbc from 2.17.7 to 2.17.10
  - build(deps): bump license-maven-plugin from 2.1.0 to 2.2.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants