Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document Top-N pushdown #8468

Merged
merged 1 commit into from
Dec 14, 2022
Merged

Conversation

findinpath
Copy link
Contributor

Resolves: #7786

@lhofhansl
Copy link
Member

Do we also want to document partial topN replacement by partial Limit for pre-sorted inputs?

name := name:varchar:text
id := id:integer:int4

Note within ``SOURCE`` fragment that the ``TableScan`` element
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on the connector though. The string comes from how the connector implemented ConnectorTableHandle#toString.

The actual thing which lets you know pushdown happenned is the fact that there's no TopNNode or LimitNode in the plan - only a TableScan.

@mosabua can point to some other pushdown docs which can be consulted for the best way to word it. Maybe https://github.com/trinodb/trino/pull/8463/files#diff-3a35320d12c22f7d413cda275bad87d6467ea3482f35eda3f89c51af38c99e85R163-R165?

@findinpath
Copy link
Contributor Author

Do we also want to document partial topN replacement by partial Limit for pre-sorted inputs?

@lhofhansl yes, sure. Can you please point me to a concrete sample of a partial TOP N query?

@lhofhansl
Copy link
Member

@findinpath That feature was implemented in #6634. Do you need an example query?

@findinpath
Copy link
Contributor Author

@lhofhansl yes, please do share a sample query for partial top N

@hashhar
Copy link
Member

hashhar commented Dec 10, 2021

@findinpath Partial TopN example -

// We expect PARTIAL TopN on the LEFT side of join to be pushed down.
assertThat(query(
// Explicitly disable join pushdown. The purpose of the this test is to verify partial TopN gets pushed down when Join is not.
Session.builder(getSession())
.setCatalogSessionProperty(getSession().getCatalog().orElseThrow(), JOIN_PUSHDOWN_ENABLED, "false")
.build(),
"SELECT * FROM nation n LEFT JOIN region r ON n.regionkey = r.regionkey " +
"ORDER BY n.nationkey LIMIT 3"))
.ordered()
.isNotFullyPushedDown(
node(TopNNode.class, // FINAL TopN
anyTree(node(JoinNode.class,
node(ExchangeNode.class, node(ProjectNode.class, node(TableScanNode.class))), // no PARTIAL TopN
anyTree(node(TableScanNode.class))))));

See

public void testUseSortedProperties()
{
String tableName = "test_propagate_table_scan_sorting_properties";
@Language("SQL") String createTableSql = format("" +
"CREATE TABLE %s " +
"WITH (" +
" bucket_count = 8," +
" bucketed_by = ARRAY['custkey']," +
" sorted_by = ARRAY['custkey']" +
") AS " +
"SELECT * FROM tpch.tiny.customer",
tableName);
assertUpdate(createTableSql, 1500L);
@Language("SQL") String expected = "SELECT custkey FROM customer ORDER BY 1 NULLS FIRST LIMIT 100";
@Language("SQL") String actual = format("SELECT custkey FROM %s ORDER BY 1 NULLS FIRST LIMIT 100", tableName);
Session session = getSession();
assertQuery(session, actual, expected, assertPartialLimitWithPreSortedInputsCount(session, 0));
session = Session.builder(getSession())
.setCatalogSessionProperty("hive", "propagate_table_scan_sorting_properties", "true")
.build();
assertQuery(session, actual, expected, assertPartialLimitWithPreSortedInputsCount(session, 1));
assertUpdate("DROP TABLE " + tableName);
}
and
public void testUseSortedPropertiesForPartialTopNElimination()
{
String tableName = "test_propagate_table_scan_sorting_properties";
// salting ensures multiple splits
String createTableSql = format("" +
"CREATE TABLE %s WITH (salt_buckets = 5) AS " +
"SELECT * FROM tpch.tiny.customer",
tableName);
assertUpdate(createTableSql, 1500L);
String expected = "SELECT custkey FROM customer ORDER BY 1 NULLS FIRST LIMIT 100";
String actual = format("SELECT custkey FROM %s ORDER BY 1 NULLS FIRST LIMIT 100", tableName);
assertQuery(getSession(), actual, expected, assertPartialLimitWithPreSortedInputsCount(getSession(), 1));
assertUpdate("DROP TABLE " + tableName);
}
for the TopN -> Limit optimization for pre-sorted inputs.

I don't think you can use an just an example query, you'd need to include CREATE TABLE (since that's what will let Trino know that the table has pre-sorted data).

@colebow
Copy link
Member

colebow commented Oct 27, 2022

👋 @findinpath - this PR is inactive and doesn't seem to be under development. If you'd like to continue work on this at any point in the future, feel free to re-open.

cc @mosabua as well, we could potentially pick this up instead.

@mosabua
Copy link
Member

mosabua commented Nov 3, 2022

We should continue this with @findinpath .. can you review @colebow ?

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. Technically accurate.

docs/src/main/sphinx/optimizer/pushdown.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/optimizer/pushdown.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/optimizer/pushdown.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/optimizer/pushdown.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/optimizer/pushdown.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/optimizer/pushdown.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/optimizer/pushdown.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/optimizer/pushdown.rst Outdated Show resolved Hide resolved
Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM % nits.

Thanks a lot for adding this.

docs/src/main/sphinx/optimizer/pushdown.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/optimizer/pushdown.rst Outdated Show resolved Hide resolved
docs/src/main/sphinx/optimizer/pushdown.rst Outdated Show resolved Hide resolved
@@ -271,3 +271,92 @@ FETCH FIRST N ROWS``.

Implementation and support is connector-specific since different data sources
support different SQL syntax and processing.

Find two similar Top-N queries in the following sections.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe @mosabua can suggest some better opening statement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, you can examine at two queries that implement a Top-N behavior in the following section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT? For example, you can find two queries to learn how to identify Top-N pushdown behavior in the following section.

Copy link
Member

@colebow colebow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hashhar can we get this merged? It looks good to go.

Copy link
Member

@hashhar hashhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this slipped through cracks. LGTM

@hashhar hashhar merged commit b9d26a7 into trinodb:master Dec 14, 2022
@github-actions github-actions bot added this to the 404 milestone Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Add information about Top N pushdown
5 participants