Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add partition argument to sequential_values test #447

Closed
macklin-fluehr opened this issue Nov 23, 2021 · 2 comments
Closed

Add partition argument to sequential_values test #447

macklin-fluehr opened this issue Nov 23, 2021 · 2 comments
Labels
duplicate enhancement New feature or request

Comments

@macklin-fluehr
Copy link

Describe the feature

The sequential values test is useful for ensuring that an entire column over the whole table is incrementing properly. In some cases, multiple types of rows can be updating in parallel within the same table. We would like the sequential_values test to include a partition feature, similar to other dbt tests.

Describe alternatives you've considered

We are planning to copy the source code of dbt utils and implement this ourselves.

Additional context

Is this feature database-specific? Which database(s) is/are relevant? Please include any other relevant context here.

Who will this benefit?

This will help cases where multiple row types are incrementing and need to be tested for incrementally.

Example 1:
Items published on a website. Each row is a day an item is active on the website. The sequential test should test that each item has sequential data for each day it was active on the website.

Example 2:
User is using an incremental model and this test to ensure that the model is pulling data correctly daily. the incremental model pulls information on 4 different marketing channels. We'd like to ensure that the data from each marketing channel is updated incrementally.

Are you interested in contributing this feature?

I think the solution could be as simple as adding this:

with windowed as (

    select
        {{ column_name }},
        lag({{ column_name }}) over (
            {% if partition_by is not none %}
            partition by {{ partition_by }}
            {% endif %}
            order by {{ column_name }}
        ) as previous_{{ column_name }}
    from {{ model }}
),

where partitioned by is a new parameter to the test, defaulting to None

@joellabes
Copy link
Contributor

Hey @macklin-fluehr! Sounds like partitioning is the hot thing at the moment. @emilyriederer's writeup covers this in a bit more detail, so even though yours came first I'm going to close it as a dupe in favour of #450 if that's OK?

As I wrote there, I'm 100% on board with this sort of solution - my only ask is that the changed behaviour comes with tests so that we're confident that both the new and existing behaviour keep working 🛠️

@macklin-fluehr
Copy link
Author

macklin-fluehr commented Nov 29, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants