Consider exposing `nth_record` or similar in stream maps #1367

aaronsteers · 2023-01-30T23:27:18Z

As a more crude (yet more portable) means of implementing

Consider exposing config like dry_run_record_limit #1366,

it was suggested that exposing something like a row number variable in stream maps could help with limiting the number of records to a specific sample size.

This in theory should be doable. We would just need to add the variable into the simplecalc evaluation context, and then users could limit on it with __filter__ stream map operation. What's interesting about this, is that __filter__ operation could then also apply modulo operations. So, it could light up these three use cases:

I only want to sync the first n records. (like Consider exposing config like dry_run_record_limit #1366)
I only want to sample one out n records. (Implemented via modulo.)
I want to introduce an identity column within the context of the current sync operation. (More useful for FULL_TABLE syncs than for INCREMENTAL ones.)

A simple way to implement this for the standalone mapper case and for target mappers would be to create a global map of stream names to counters, and increment those per each record seen; then, the counter could be exposed as nth_record or record_ordinal or similar in the simplecalc evaluation context. Taps could implement differently, since they now ahead of time what streams they are looping through, but it probably isn't worth the complexity to have divergent code paths here.

The text was updated successfully, but these errors were encountered:

stale · 2023-07-18T03:10:22Z

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

aaronsteers added the Accepting Pull Requests label Jan 31, 2023

stale bot added the stale label Jul 18, 2023

stale bot closed this as completed Aug 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider exposing `nth_record` or similar in stream maps #1367

Consider exposing `nth_record` or similar in stream maps #1367

aaronsteers commented Jan 30, 2023 •

edited

Loading

stale bot commented Jul 18, 2023

Consider exposing nth_record or similar in stream maps #1367

Consider exposing nth_record or similar in stream maps #1367

Comments

aaronsteers commented Jan 30, 2023 • edited Loading

stale bot commented Jul 18, 2023

Consider exposing `nth_record` or similar in stream maps #1367

Consider exposing `nth_record` or similar in stream maps #1367

aaronsteers commented Jan 30, 2023 •

edited

Loading