forked from prestodb/presto
-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrade to 0.157 #57
Merged
dabaitu
merged 1,219 commits into
twitter-forks:twitter-master
from
dabaitu:twitter-master
Dec 2, 2016
Merged
upgrade to 0.157 #57
dabaitu
merged 1,219 commits into
twitter-forks:twitter-master
from
dabaitu:twitter-master
Dec 2, 2016
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When one side of a join has an effective predicate expression in terms of the field use in the join criteria (e.g., v = f(k1), with a join criteria of k1 = k2), and that expression can produce null on non-null input (e.g., nullif, case, if, most of the array/map functions, etc), queries can produce incorrect results. In that scenario, predicate pushdown derives another join condition v = f(k2). Since f() can produce null on non-null input, it's possible for some value of k1 that's equal to k2, f(k2) is null or f(k1) is null. This will cause the join criteria to evaluate to null instead of true. A correct derivation, although less useful for predicate pushdown, would be k1 = k2 AND ((f(k1) IS NULL AND f(k2) IS NULL) OR f(k1) = f(k2)). This change prevents the equality inference logic from considering expressions that may return null on non-null input.
CAST(JSON 'null' AS ...) will also return null
This version avoids allocating arrays that are beyond the JVM limit.
Currently only the ordering column is being printed in the Explain plan output for Window nodes. It is also desirable to know what ordering is used for each of those columns.
Other queries could timeout because they were abandoned causing the test to fail
Rename fields in ORC dictionary reader to make it clear if the field is used for the stripe dictionary or row group dictionary.
Always create dictionary blocks in DRWF for columns using a row group dictionary. This prevents expansion of the dictionary which can create a very large block.
Simplify the connector materialization of connectors in ConnectorManager
Acquire transaction handle in SystemConnector lazily to avoid accessing the transaction manager during begin transaction.
This fixes a regression from the previous commit.
Test and test utility methods were declared that can throw Exception while no exception could be thrown.
It is weird that a method gets already rewrittenNode.
Adding test for b19d3df ("Fix base for counter in AssignUniqueIdOperator"). Without mentioned commit added test fails.
This is a rewrite of the partial aggregation pushdown optimizer to make the code easier to follow and reason about. The approach is as follows: 1. Determine whether the optimization is applicable. At a minimum, there must be an aggregation on top of an exchange. 2. If the aggregation is SINGLE, split it into a FINAL on top of a PARTIAL and reprocess the resulting plan. 3. If the aggregation is a PARTIAL, push it underneath each branch of the exchange. We use a couple of tricks to avoid having to juggle and rename field names as the nodes are rewired: 1. When pushing the partial aggregation through the exchange, the names of the outputs of the aggregation are preserved. 2. If the input->output mappings in the exchange are not simple identity projections without rename, we introduce a projection under the partial aggregation. This helps avoid having to rewrite all the aggregation functions to refer to new names. It also fixes a planning issue under certain scenarios involving aggregation subqueries and partitioned tables. E.g., SELECT * FROM ( SELECT count(*) FROM tpch.tiny.orders HAVING count(DISTINCT custkey) > 1 ) CROSS JOIN t where "t" is a partitioned Hive table.
82620d9 caused a regression when scheduling non-remotely accessible splits, bucketed splits, or splits when network aware scheduling was used
Now that we use a low watermark to trigger scheduling, we don't want to reserve too much space for splits with network affinity, otherwise the scheduler may have to run too frequently when splits have little to no affinity
These connectors uses non-canonincal types for varchar columns in TPC-H, so the output doesn't match. Disable the tests for now.
Don't wait for deletion executor if there are no rows to delete.
👍 assuming all tests pass. |
all tests pass |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
upgrade to 0.157 part 2 of 3
part 1 - remove old twitter event scriber impl
part 2 - upgrade to oss 0.157
part 3 - add new twitter event scriber