Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tiebreakers to Activity Occurrence & Activity Repeated At #1

Open
wylbee opened this issue Apr 5, 2023 · 0 comments
Open

Add tiebreakers to Activity Occurrence & Activity Repeated At #1

wylbee opened this issue Apr 5, 2023 · 0 comments

Comments

@wylbee
Copy link

wylbee commented Apr 5, 2023

With the current code, if the same customer performs the same activity multiple times at the same ts, the activity_occurrence and activity_repeated_at columns do not resolve deterministically and can vary from run to run depending on the DB. I see this happening in my own data on Snowflake when the source system records a date rather than a timestamp that I am then casting as a timestamp for the stream.

This behavior should have ~no impact on results, but does make equality testing between dev and prod environments painful via the audit-helper dbt package or data-diff framework since these two columns can change with each run.

I would expect that adding the activity_id to the order by would resolve this issue without introducing additional complications. Something like:

{# Creates the two activity occurrence columns: activity_occurrence and activity_repeated_at  #}

{% macro activity_occurrence() %}
    row_number() over (
        partition by coalesce (
            {{ safe_cast("customer", type_string()) }},
            {{ safe_cast("anonymous_customer_id", type_string()) }}
            ) order by ts asc, activity_id asc) as activity_occurrence,
    lead(ts) over (
        partition by coalesce (
            {{ safe_cast("customer", type_string()) }},
            {{ safe_cast("anonymous_customer_id", type_string()) }}
        ) order by ts asc, activity_id asc) as activity_repeated_at
{% endmacro %}

What would be the best way for me to support resolving this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant