You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
it was suggested that exposing something like a row number variable in stream maps could help with limiting the number of records to a specific sample size.
This in theory should be doable. We would just need to add the variable into the simplecalc evaluation context, and then users could limit on it with __filter__ stream map operation. What's interesting about this, is that __filter__ operation could then also apply modulo operations. So, it could light up these three use cases:
I only want to sample one out n records. (Implemented via modulo.)
I want to introduce an identity column within the context of the current sync operation. (More useful for FULL_TABLE syncs than for INCREMENTAL ones.)
A simple way to implement this for the standalone mapper case and for target mappers would be to create a global map of stream names to counters, and increment those per each record seen; then, the counter could be exposed as nth_record or record_ordinal or similar in the simplecalc evaluation context. Taps could implement differently, since they now ahead of time what streams they are looping through, but it probably isn't worth the complexity to have divergent code paths here.
The text was updated successfully, but these errors were encountered:
This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen label, or request that it be added.
As a more crude (yet more portable) means of implementing
dry_run_record_limit
#1366,it was suggested that exposing something like a
row number
variable in stream maps could help with limiting the number of records to a specific sample size.This in theory should be doable. We would just need to add the variable into the
simplecalc
evaluation context, and then users could limit on it with__filter__
stream map operation. What's interesting about this, is that__filter__
operation could then also apply modulo operations. So, it could light up these three use cases:n
records. (like Consider exposing config likedry_run_record_limit
#1366)n
records. (Implemented via modulo.)FULL_TABLE
syncs than forINCREMENTAL
ones.)A simple way to implement this for the standalone mapper case and for target mappers would be to create a global map of stream names to counters, and increment those per each record seen; then, the counter could be exposed as
nth_record
orrecord_ordinal
or similar in thesimplecalc
evaluation context. Taps could implement differently, since they now ahead of time what streams they are looping through, but it probably isn't worth the complexity to have divergent code paths here.The text was updated successfully, but these errors were encountered: