docs: Add docs on dbt Cloud integration #1763

danthelion · 2024-11-07T19:03:15Z

Description:

(Describe the high level scope of new or changed features)

Workflow steps:

(How does one use this feature, and how has it changed)

Documentation links affected:

(list any documentation links that you created, or existing ones that you've identified as needing updates, along with a brief description)

Notes for reviewers:

(anything that might help someone review this PR)

This change is

github-actions · 2024-11-07T19:05:32Z

PR Preview Action v1.4.8
🚀 Deployed preview to https://estuary.github.io/flow/pr-preview/pr-1763/
on branch `gh-pages` at 2024-11-08 15:24 UTC

mdibaiee · 2024-11-08T14:38:02Z

site/docs/guides/dbt-integration.md

+
+- Job ID: The unique identifier for the dbt job you wish to trigger.
+- Account ID: Your dbt account identifier.
+- API Key: The dbt API key associated with your account. This allows Estuary Flow to authenticate with dbt Cloud and


They also need an Access URL, it is mandatory, I know the connector marks it as non-required, but that's because we previously had Account Prefix and to be backward-compatible we had to keep the new one marked as non-required, but we will validate that one of the two exists every time

mdibaiee · 2024-11-08T14:39:55Z

site/docs/guides/dbt-integration.md

+
+### Optional Parameters
+
+- Access URL: The dbt access URL can be found in your dbt Account Settings. Use this URL if your dbt account requires a


I think it is worth adding here since a few customers have had this issue: If they can't find their Access URL in their dashboard, it is because they are old customers and have not yet migrated to the new API, in this case their Access URL is: https://cloud.getdbt.com/

mdibaiee · 2024-11-08T14:53:28Z

site/docs/guides/dbt-integration.md

+
+### Job Management
+
+If you want to avoid triggering multiple overlapping dbt jobs, set Job Trigger Mode to skip. This way, if a job is


I think it's worth mentioning this is the default behavior

mdibaiee · 2024-11-08T15:06:24Z

site/docs/guides/dbt-integration.md

+
+### Regular Data Transformation on New Data
+
+Suppose you have a data pipeline that ingests data into a warehouse every 1 hour (configured via a Sync Frequency),


The dbt cloud trigger starts the timer as soon as the first data arrives at the connector, and any subsequent timers are also started when data arrives.

If a connector has a delay of 1 hour, this is how it would look like:

Connector starts up -> runs a first dbt job trigger (this is to ensure consistency when connector restarts) -> materializes one small chunk -> starts timer to trigger dbt job in N minutes -> materializes the rest of chunks -> start 1 hour delay of connector of not backfilling -> trigger dbt job when N minutes have passed since the timer started (this includes during backfills)

So in that sense, it is best that their dbt job trigger interval is not very long. The default is 30 minutes which means 30 minutes after the first bulk of data is committed. It is not very short to avoid many jobs during backfills, but it means during non-backfill periods we will wait 30 minutes after commiting the first commit and then triggering a job. How much of a latency this creates between the final data point being materialized and the dbt job triggering depends on how long it takes for their data to be materialized to the destination

This is the current compromise we have to be able to set a minimum interval between dbt job triggers, support cases where connectors don't use Sync Interval, support use cases where data is arrival is very sparse (once a day for example)

thanks for the detailed writeup, I tried to incorporate this as best as possible

Add docs on dbt Cloud integration

af38cfa

danthelion requested a review from mdibaiee November 7, 2024 19:03

mdibaiee reviewed Nov 8, 2024

View reviewed changes

review comments

eaabc4c

danthelion requested a review from mdibaiee November 8, 2024 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Add docs on dbt Cloud integration #1763

docs: Add docs on dbt Cloud integration #1763

danthelion commented Nov 7, 2024 •

edited by jgraettinger

Loading

github-actions bot commented Nov 7, 2024 •

edited

Loading

mdibaiee Nov 8, 2024

mdibaiee Nov 8, 2024

mdibaiee Nov 8, 2024

mdibaiee Nov 8, 2024

danthelion Nov 8, 2024


		### Optional Parameters

		- Access URL: The dbt access URL can be found in your dbt Account Settings. Use this URL if your dbt account requires a


		### Job Management

		If you want to avoid triggering multiple overlapping dbt jobs, set Job Trigger Mode to skip. This way, if a job is


		### Regular Data Transformation on New Data

		Suppose you have a data pipeline that ingests data into a warehouse every 1 hour (configured via a Sync Frequency),

docs: Add docs on dbt Cloud integration #1763

Are you sure you want to change the base?

docs: Add docs on dbt Cloud integration #1763

Conversation

danthelion commented Nov 7, 2024 • edited by jgraettinger Loading

github-actions bot commented Nov 7, 2024 • edited Loading

mdibaiee Nov 8, 2024

Choose a reason for hiding this comment

mdibaiee Nov 8, 2024

Choose a reason for hiding this comment

mdibaiee Nov 8, 2024

Choose a reason for hiding this comment

mdibaiee Nov 8, 2024

Choose a reason for hiding this comment

danthelion Nov 8, 2024

Choose a reason for hiding this comment

danthelion commented Nov 7, 2024 •

edited by jgraettinger

Loading

github-actions bot commented Nov 7, 2024 •

edited

Loading