forked from prestodb/presto
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to 0.161 #63
Merged
dabaitu
merged 218 commits into
twitter-forks:twitter-master
from
dabaitu:twitter-master
Jan 6, 2017
Merged
Upgrade to 0.161 #63
dabaitu
merged 218 commits into
twitter-forks:twitter-master
from
dabaitu:twitter-master
Jan 6, 2017
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
flushCache only makes sense in CachingHiveMetastore.
instead of inheritance
The original purpose of this function was to provide an exception-free alternative to array and map subscript operators. This change makes the array version consistent with the function that operates on maps.
Converting the type to uppercase breaks type equality for row types as the field names get uppercased.
This allows Presto start regardless of host resolution at the moment of startup. Previously, Presto will fail to start when any entry from cassandra.contact-points is a host (instead of IP) and is not resolvable. This change postpones host resolution to the first query.
Rename symbols to the actual columns type instead of using the alphabet. Alphabetic are error prone and it is hard to merge patched that adds new columns.
Symbol unaliasing for ExchangeNode canonizes symbols that are aliased in source nodes. Plan after optimization: presto:default> explain SELECT c.custkey FROM customer c, orders o WHERE c.custkey = o.custkey AND o.orderdate >= DATE '1994-01-01'; Query Plan ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- - Output[custkey] => [custkey:bigint] - RemoteExchange[GATHER] => custkey:bigint - Project => [custkey:bigint] - InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, $hashvalue:bigint, custkey_0:bigint, $hashvalue_15:bigint] - RemoteExchange[REPARTITION] => custkey:bigint, $hashvalue:bigint - Project => [custkey:bigint, $hashvalue_14:bigint] $hashvalue_14 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("custkey"), 0)) - TableScan[hive:hive:default:customer, originalConstraint = true] => [custkey:bigint] LAYOUT: hive custkey := HiveColumnHandle{clientId=hive, name=custkey, hiveType=bigint, hiveColumnIndex=0, columnType=REGULAR} - RemoteExchange[REPARTITION] => custkey_0:bigint, $hashvalue_15:bigint - Project => [$hashvalue_16:bigint, custkey_0:bigint] $hashvalue_16 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("custkey_0"), 0)) - Filter[("orderdate" >= "$literal$date"(BIGINT '8766'))] => [custkey_0:bigint, orderdate:date] - TableScan[hive:hive:default:orders, originalConstraint = ("orderdate" >= "$literal$date"(BIGINT '8766'))] => [custkey_0:bigint, orderdate:date] LAYOUT: hive custkey_0 := HiveColumnHandle{clientId=hive, name=custkey, hiveType=bigint, hiveColumnIndex=1, columnType=REGULAR} orderdate := HiveColumnHandle{clientId=hive, name=orderdate, hiveType=date, hiveColumnIndex=4, columnType=REGULAR} Plan before optimization: presto:default> explain SELECT c.custkey FROM customer c, orders o WHERE c.custkey = o.custkey AND o.orderdate >= DATE '1994-01-01'; Query Plan ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- - Output[custkey] => [custkey:bigint] - RemoteExchange[GATHER] => custkey:bigint - Project => [custkey:bigint] - InnerJoin[("custkey_8" = "custkey_9")] => [custkey:bigint, custkey_8:bigint, $hashvalue:bigint, custkey_0:bigint, custkey_9:bigint, $hashvalue_16:bigint] - Project => [custkey:bigint, custkey_8:bigint, $hashvalue:bigint] - RemoteExchange[REPARTITION] => custkey:bigint, custkey_8:bigint, $hashvalue:bigint, $hashvalue_14:bigint - Project => [custkey:bigint, $hashvalue_15:bigint] $hashvalue_15 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("custkey"), 0)) - TableScan[hive:hive:default:customer, originalConstraint = true] => [custkey:bigint] LAYOUT: hive custkey := HiveColumnHandle{clientId=hive, name=custkey, hiveType=bigint, hiveColumnIndex=0, columnType=REGULAR} - Project => [custkey_0:bigint, custkey_9:bigint, $hashvalue_16:bigint] - RemoteExchange[REPARTITION] => custkey_0:bigint, custkey_9:bigint, $hashvalue_16:bigint, $hashvalue_17:bigint - Project => [$hashvalue_18:bigint, custkey_0:bigint] $hashvalue_18 := "combine_hash"(BIGINT '0', COALESCE("$operator$hash_code"("custkey_0"), 0)) - Filter[("orderdate" >= "$literal$date"(BIGINT '8766'))] => [custkey_0:bigint, orderdate:date] - TableScan[hive:hive:default:orders, originalConstraint = ("orderdate" >= "$literal$date"(BIGINT '8766'))] => [custkey_0:bigint, orderdate:date] LAYOUT: hive custkey_0 := HiveColumnHandle{clientId=hive, name=custkey, hiveType=bigint, hiveColumnIndex=1, columnType=REGULAR} orderdate := HiveColumnHandle{clientId=hive, name=orderdate, hiveType=date, hiveColumnIndex=4, columnType=REGULAR}
This reduces number of unique symbols in query plan and allows other optimizations to be applied (e.g: running multiple joins in the same stage which operate on same partitions after canonicalization)
PredicatePushdown optimizer created unnecessary symbols for join clauses which are not required.
This reverts commit af03259.
This allows trivial queries to run even when the node is "out of memory"
Previously they were uploaded from the 'PRODUCT_TESTS' job.
This allows us to keep the logs for restarted jobs.
…avior We recently made a change to the column resolution rules for ORDER BY to make them compliant with ANSI SQL. In order to ease the transition from the old semantics, we now add a config option and session property that controls the behavior. The session property is "legacy_order_by". The config option is "deprecated.legacy-order-by".
The arguments were in the wrong order, so the value from FeaturesConfig was not being used to control the default value.
A recent commit (ec2e897) changed the way ORDER BY expressions are handled in a way that causes certain expression to not be "analyzed" and their types be recorded in the Analysis object. extractAggregates() looks for aggregations in node.getOrderBy() and records them for later use by the planner. The new ORDER BY analyzer process the rewritten expressions, which have a different object identity. As a result, the aggregates don't have associated type and implicit coercion information for the planner to use. This change makes it so that the aggregates are extracted from the rewritten expressions.
Due to ec2e897, when analysis fails for certain expressions, the error is misreported as happening in the SELECT clause instead of in the ORDER BY clause. This is because the analyzer processes the rewritten expressions, which contain inlined SELECT expressions and their original locations. This change fixes the issue by analyzing the original unmodified expressions with a synthetic scope built from the output of the SELECT clause that can delegate resolution to the source scope for missing names (essentially, it implements the resolution rules per the SQL spec). One side-effect of this change is that queries whose ORDER BY clause reference columns that appear multiple times in the SELECT clause are now considered invalid due to ambiguous references -- this matches the expected behavior according to the ANSI spec.
This query shape is no longer valid due to ambiguous column references.
👍 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.