Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider exposing nth_record or similar in stream maps #1367

Closed
aaronsteers opened this issue Jan 30, 2023 · 1 comment
Closed

Consider exposing nth_record or similar in stream maps #1367

aaronsteers opened this issue Jan 30, 2023 · 1 comment

Comments

@aaronsteers
Copy link
Contributor

aaronsteers commented Jan 30, 2023

As a more crude (yet more portable) means of implementing

it was suggested that exposing something like a row number variable in stream maps could help with limiting the number of records to a specific sample size.

This in theory should be doable. We would just need to add the variable into the simplecalc evaluation context, and then users could limit on it with __filter__ stream map operation. What's interesting about this, is that __filter__ operation could then also apply modulo operations. So, it could light up these three use cases:

  1. I only want to sync the first n records. (like Consider exposing config like dry_run_record_limit  #1366)
  2. I only want to sample one out n records. (Implemented via modulo.)
  3. I want to introduce an identity column within the context of the current sync operation. (More useful for FULL_TABLE syncs than for INCREMENTAL ones.)

A simple way to implement this for the standalone mapper case and for target mappers would be to create a global map of stream names to counters, and increment those per each record seen; then, the counter could be exposed as nth_record or record_ordinal or similar in the simplecalc evaluation context. Taps could implement differently, since they now ahead of time what streams they are looping through, but it probably isn't worth the complexity to have divergent code paths here.

@stale
Copy link

stale bot commented Jul 18, 2023

This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.

@stale stale bot added the stale label Jul 18, 2023
@stale stale bot closed this as completed Aug 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant