Pushing down 'not equal' to predicate #1365

akuzin1 · 2023-07-07T14:54:52Z

Description of changes:
Modified statement that is responsible for adding IS NOT NULL clause to generate sql for JDBC connectors.
Prior to change, when evaluating a not equal to predicate it would short circuit and not push down all predicates to sql statement.
For Example:
SELECT * FROM item WHERE i_manufact_id <> 138
Would be translated to:
SELECT * FROM item WHERE (i_manufact_id IS NOT NULL)

This is the result of calling getSpan() on SortedRangeSet, which returns a Range that is a summary of the lowest lower bound and the highest upper bound that represents this ValueSet.

Therefore, the following ValueSet from statement with i_manufact_id <> 138 predicate:

        i_manufact_id=SortedRangeSet{
            type=Int(64, true), nullAllowed=false, lowIndexedRanges={
                Marker{
                   valueBlock=Int(64, true), nullValue=true, valueBlock=true, bound=ABOVE
                   }=Range{
                        low=Marker{
                            valueBlock=Int(64, true), nullValue=true, valueBlock=true, bound=ABOVE
                           }, 
                        high=Marker{
                            valueBlock=Int(64, true), nullValue=false, valueBlock=138, bound=BELOW
                        }
                   }, 
                Marker{
                   valueBlock=Int(64, true), nullValue=false, valueBlock=138, bound=ABOVE
                   }=Range{
                        low=Marker{
                            valueBlock=Int(64, true), nullValue=false, valueBlock=138, bound=ABOVE
                        }, 
                        high=Marker{
                            valueBlock=Int(64, true), nullValue=true, valueBlock=true, bound=BELOW
                        }
                   }
               }
           }
       }

and the following ValueSet from statement with i_manufact_id IS NOT NULL predicate:

i_manufact_id=SortedRangeSet{
            type=Int(64, true), nullAllowed=false, lowIndexedRanges={
                Marker{
                    valueBlock=Int(64, true), nullValue=true, valueBlock=true, bound=ABOVE
                }
                =Range{
                    low=Marker{
                        valueBlock=Int(64, true), nullValue=true, valueBlock=true, bound=ABOVE
                    }, 
                    high=Marker{
                        valueBlock=Int(64, true), nullValue=true, valueBlock=true, bound=BELOW
                    }
                }
           }
        }

evaluate to the same Range when called getSpan on and they exit at the same condition.

According to ANSI SQL standards, when assessing a predicate comparison involving a NULL value, it does not yield true or false outcomes and is consequently excluded from the result set. This means that if there is more than one comparison operator performed on a single column, it will automatically drop NULL values, so IS NOT NULL only needs to be pushed down if it that's the only one.

So for example, in the following statement:
SELECT * FROM item WHERE i_manufact_id <> 138 AND i_manufact_id IS NOT NULL
the latter part will be inherently be achieved by pushing down first part.

The change results in 'IS NOT NULL' predicate only being pushed down if there's one Range present in ValueSet.

Generated SQL After Change:
SELECT * FROM item WHERE ((i_manufact_id < 138) OR (i_manufact_id > 138))

Testing:
Because JdbcSplitQueryBuilder is an abstract class and can't be initialized, I added unit tests to the MySqlQueryStringBuilder to verify proper behavior for generated SQL string, when predicates != and IS NOT NULL are being used.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

burhan94 · 2023-07-10T17:57:49Z

can we add some test cases for this pls?

hackett123 · 2023-07-14T17:40:21Z

...a-mysql/src/test/java/com/amazonaws/athena/connectors/mysql/MySqlQueryStringBuilderTest.java

+        Map<String, ValueSet> constraintsMap = new HashMap<>();
+        constraintsMap.put("testCol2", SortedRangeSet.of(false, Range.all(allocator, org.apache.arrow.vector.types.Types.MinorType.INT.getType())));


nit, you can init and add the map values in one line using ImmutableMap.of(key, value, key2, value2, ...).

It's generally cleaner to avoid initing a collection and later adding or modifying it (called the init-then-modify antipattern). This here isn't maybe only a violation of it from a purists perspective, but you can imagine how it can lead to cleaner code. so this comment applies to both tests in this file

- limit clause pushdown fix for unsupported data sources (#1373) - Pushing down 'not equal' to predicate (#1365) - Added ! as valid character in Secret, due to syntax of service linked secrets (#1383) - build(deps): bump aws-sdk.version from 1.12.504 to 1.12.507 (#1380) - build(deps): bump msal4j from 1.13.8 to 1.13.9 - build(deps): bump aws-sdk.version from 1.12.504 to 1.12.506 - build(deps): bump protobuf-java from 3.23.3 to 3.23.4 - build(deps): bump google-cloud-resourcemanager from 1.22.0 to 1.23.0 - build(deps): bump google-cloud-storage from 2.23.0 to 2.24.0 - build(deps): bump jt400 from 11.2 to 20.0.0 - build(deps): bump avro from 1.11.1 to 1.11.2 - build(deps-dev): bump DynamoDBLocal from 1.22.0 to 2.0.0 - build(deps): bump google-cloud-storage from 2.22.4 to 2.23.0 - build(deps): bump hadoop-common from 3.3.5 to 3.3.6 - build(deps): bump grpc-google-cloud-bigquerystorage-v1 - build(deps): bump google-cloud-resourcemanager from 1.21.0 to 1.22.0 - build(deps): bump aws-cdk.version from 1.203.0 to 1.204.0 - build(deps): bump snowflake-jdbc from 3.13.32 to 3.13.33 - build(deps): bump google-cloud-bigquery from 2.27.1 to 2.29.0 - build(deps): bump aws-msk-iam-auth from 1.1.6 to 1.1.7 - build(deps): bump guava from 32.0.1-jre to 32.1.1-jre - build(deps): bump jqwik from 1.7.3 to 1.7.4 - build(deps): bump aws-sdk.version from 1.12.490 to 1.12.504 - build(deps-dev): bump equalsverifier from 3.14.2 to 3.15 - build(deps): bump redshift-jdbc42 from 2.1.0.16 to 2.1.0.17 - build(deps): bump ngdbc from 2.17.7 to 2.17.10 - build(deps): bump license-maven-plugin from 2.1.0 to 2.2.0

akuzin1 force-pushed the JdbcNotEqualPredicate branch from 7534ddd to 8ab67f1 Compare July 7, 2023 15:04

chngpe approved these changes Jul 10, 2023

View reviewed changes

akuzin1 force-pushed the JdbcNotEqualPredicate branch from 8ab67f1 to 4dd1a58 Compare July 13, 2023 19:37

hackett123 reviewed Jul 14, 2023

View reviewed changes

Pushing down 'not equal' to predicate

d6a6211

akuzin1 force-pushed the JdbcNotEqualPredicate branch from 4dd1a58 to d6a6211 Compare July 17, 2023 17:51

Merge branch 'master' into JdbcNotEqualPredicate

bfb2652

akuzin1 merged commit c31a644 into master Jul 21, 2023

akuzin1 deleted the JdbcNotEqualPredicate branch July 21, 2023 19:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pushing down 'not equal' to predicate #1365

Pushing down 'not equal' to predicate #1365

akuzin1 commented Jul 7, 2023 •

edited

Loading

burhan94 commented Jul 10, 2023

hackett123 Jul 14, 2023

		Map<String, ValueSet> constraintsMap = new HashMap<>();
		constraintsMap.put("testCol2", SortedRangeSet.of(false, Range.all(allocator, org.apache.arrow.vector.types.Types.MinorType.INT.getType())));

Pushing down 'not equal' to predicate #1365

Pushing down 'not equal' to predicate #1365

Conversation

akuzin1 commented Jul 7, 2023 • edited Loading

burhan94 commented Jul 10, 2023

hackett123 Jul 14, 2023

Choose a reason for hiding this comment

akuzin1 commented Jul 7, 2023 •

edited

Loading