You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Docs generation can fail in new and interesting ways in dbt v0.15.0. Whereas on most databases, the information_schema is accessed at the database-level, on BigQuery, it is sometimes accessed at the project level and other times at the dataset level. This means that the catalog query on BigQuery can fail if a dataset does not exist. That is not the case on other plugins.
I've seen some reports of users whose dbt docs generate task is failing on BigQuery with dataset xxx does not exist in region US. This appears to be happening because dbt is querying the information schema for a dataset which does not exist.
There are two ways this can happen:
dbt is checking an information schema errantly. From @ryanhempenstall on Slack:
Each of the folders within our models folder uses its own schema mapping in the dbt_project.yml file (pasted in below). The schema for our models is then generated based on the target + the schema in our profile + the schema mapping in the yml file. For example, a model might go into dbt_ryan_facts for me but would be added to dbt_alice_facts for Alice when we run with the target dev but if we’re running in production it would just be facts.
404 Not found: Dataset dev-de1-eu-datatools:dbt_ryan was not found in location US
This seems to indicate that dbt is checking a dataset called dbt_ryan which no models are configured to build into! Here, it's possible that dbt is errantly generating an information_schema relation for a resource that should not participate in docs generation (eg. an ephemeral model, or a disabled model).
There might not be an error at all here, but the project might be configured to build a resource into a schema that doesn't exist (eg. a seed file exists, but dbt seed has never been run).
In either case, I think that failing the entire docs generation step for a single missing dataset is suboptimal. Can we either:
check the list of existing datasets and prune the supplied information_schemas to only search in existing datasets
run a single catalog query for each information schema, then combine the results at the end. If one of these queries fails with a DatabaseError: 404 not found, we can catch the error and print a warning. Note: running one query per dataset might incur a non-trivial cost on BigQuery.
Steps To Reproduce
case #1: configure a model to build into a dataset that does not exist. Run dbt docs generate on BigQuery and observe the failure.
Describe the bug
Docs generation can fail in new and interesting ways in dbt v0.15.0. Whereas on most databases, the
information_schema
is accessed at the database-level, on BigQuery, it is sometimes accessed at theproject
level and other times at thedataset
level. This means that the catalog query on BigQuery can fail if a dataset does not exist. That is not the case on other plugins.I've seen some reports of users whose
dbt docs generate
task is failing on BigQuery withdataset xxx does not exist in region US
. This appears to be happening because dbt is querying the information schema for a dataset which does not exist.There are two ways this can happen:
The error that dbt encounters is:
This seems to indicate that dbt is checking a dataset called
dbt_ryan
which no models are configured to build into! Here, it's possible that dbt is errantly generating aninformation_schema
relation for a resource that should not participate in docs generation (eg. an ephemeral model, or a disabled model).dbt seed
has never been run).In either case, I think that failing the entire docs generation step for a single missing dataset is suboptimal. Can we either:
information_schemas
to only search in existing datasetsDatabaseError: 404 not found
, we can catch the error and print a warning. Note: running one query per dataset might incur a non-trivial cost on BigQuery.Steps To Reproduce
case #1: configure a model to build into a dataset that does not exist. Run
dbt docs generate
on BigQuery and observe the failure.case #2: configure a disabled model with:
run
dbt docs generate
on BigQuery and observe the failurecase #3: configure an ephemeral model with:
run
dbt docs generate
on BigQuery and observe the failureExpected behavior
dbt docs generate
should succeed even if a dataset present in the manifest does not exist in the database.System information
Which database are you using dbt with?
The output of
dbt --version
:The text was updated successfully, but these errors were encountered: