Skip to content
This repository has been archived by the owner on Oct 31, 2019. It is now read-only.

Checkpointing Queries

Siva Narayanan edited this page Jul 3, 2015 · 1 revision

It is a common requirement to be able to query incremental data in Kinesis. Example, you may want to find the number of 500 errors in the last minute. You can schedule a query against the Kinesis stream to run every minute. However, you'd like the query to only process data that has arrived since the last time you checked. The checkpointing feature in Kinesis is built for this use-case and the Presto connector builds on this Kinesis feature.

The checkpointing feature uses AWS DynamoDB - your credentials must allow for creation and access of DynamoDB tables. You will need to enable the checkpointing feature by setting this parameter in ${PRESTO_HOME}etc/catalog/kinesis.properties.

kinesis.checkpoint-enabled=true

You'll need to set the following presto session variables to use the checkpointing feature:

set session kinesis.checkpoint-logical-name="ServerErrorCounts";
set session kinesis.iteration-number=<iteration-number>;

For the first query in the sequence, you should set iteration-number to 0. For the second query, you should set iteration-number to 1 (and so on). The second will read data that has entered the stream after iteration 0 was complete. This is best accomplished by a script / tool that updates the iteration number and submits the query one after another. You can run a query against an older checkpoint by setting the right iteration-number.

Clone this wiki locally