Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-981] Raise clear warning/error if yaml property key is misnamed #5605

Open
2 tasks done
albertovpd opened this issue Aug 3, 2022 · 8 comments
Open
2 tasks done
Labels
enhancement New feature or request paper_cut A small change that impacts lots of users in their day-to-day Refinement Maintainer input needed

Comments

@albertovpd
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I am testing a model which have 2 tests for a column.
If tests are called under a field named test and not testS, it runs satisfactorily just the 1st of the tests, without warning/logging errors about that the 2nd test did not run.

image

Expected Behavior

To have an error output like the following:
Found more than 1 test under the test field. Please reference the structure as "tests"

Steps To Reproduce

  1. create a custom test
  2. apply that custom test and a basic one to a column of a model

Relevant log output

The problem is that there is no output warning about just the first test ran

Environment

- OS: WSL for windows (so we should say ubuntu)
- Python: 3.8.2
- dbt: 1.1.0

Which database adapter are you using with dbt?

bigquery

Additional Context

image

I created this custom test and regardless the name of the column, it always passed. Then I realised that the test section in the yml file should be named tests, in plural. Then, it works.

@albertovpd albertovpd added bug Something isn't working triage labels Aug 3, 2022
@github-actions github-actions bot changed the title Not warning about referenced tests in the main .yml file that are not run [CT-981] Not warning about referenced tests in the main .yml file that are not run Aug 3, 2022
@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 9, 2022

@albertovpd Thanks for opening!

I tried this out locally, and I found that test is just completely ignored. Only tests defined under the tests key (correctly named) were actually defined and run.

I think the deal here is that our dataclasses / validation for both model and columns support additional properties being defined:

models:
  - name: some_model
    invalid_key:
      - something
    columns:
      - name: id
        invalid_key:
          - something_else

It would be better if dbt raised a clear and explicit error message that invalid_key is not supported! At least a warning, surely. Especially now that we have dedicated properties to accept arbitrary, user-provided key-value pairs — namely config and meta.

@jtcohen6 jtcohen6 removed the triage label Aug 9, 2022
@jtcohen6 jtcohen6 changed the title [CT-981] Not warning about referenced tests in the main .yml file that are not run [CT-981] Raise clear warning/error if yaml property key is misnamed Aug 9, 2022
@albertovpd
Copy link
Author

You're right @jtcohen6
Also your proposed solution is great from the user point of view

@leahwicz leahwicz added the support_rotation A good task to pick up during support rotation label Sep 3, 2022
@jtcohen6 jtcohen6 added the paper_cut A small change that impacts lots of users in their day-to-day label Jan 6, 2023
@jtcohen6
Copy link
Contributor

jtcohen6 commented Feb 1, 2023

I just closed an older issue that's really the same as this one: #4280

That one had several comments/upvotes. This is definitely a UX improvement that several people have asked for explicitly, and it would likely benefit many many more!

@jtcohen6 jtcohen6 added enhancement New feature or request and removed support_rotation A good task to pick up during support rotation bug Something isn't working labels Feb 1, 2023
@dbeatty10
Copy link
Contributor

From @alittlesliceoftom here:

I was going to raise an issue:
"Incorrect Configs Are Silently Ignored - they should raise errors"

But this seems like a close fit. Adding an example:
{{ config( materialized = 'table', dist= 'REPLICATE', this_field_is_made_up = 'and_has_no_effect' ) }}
Should fail, but in fact it just does nothing.

I think this should fail as it's easy for a developer to type "dist" or "distribution" and get different results.
Or spell materialized as materialised.

It would be more developer friendly to let them know there's an error.

@dataders
Copy link
Contributor

dataders commented Sep 7, 2023

Another example from "the wild" (ie Community Slack thread)

user reported that the datawarehouse was not changed as defined in their config

{{
    config(
        materialized='table',
        schema='finance',
        snowflake_datawarehouse='DATAWAREHOUSE_MEDIUM'
    )
}}

the issue is they were using snowflake_datawarehouse instead of snowflake_warehouse

@mwstanleyft
Copy link

Here is a further example of someone creating an invalid model config (they confused it with a source config) and it would have been helpful for this to throw a parsing error because their yaml is incorrectly formatted, rather than failing in a more obscure way. https://getdbt.slack.com/archives/CBSQTAPLG/p1702483400951449

@dbeatty10
Copy link
Contributor

Thanks for linking that thread in Slack to this feature request @mwstanleyft !

This is a nice summary of the surprise factor:

Tbh I'm always unpleasantly surprised that dbt doesn't enforce a correct yaml schema and throw an error if the yaml is incorrect. You can get a lot of obscure errors simply because you've made a typo or something in the yaml that results in an invalid file, but dbt finds a way regardless

@dbeatty10
Copy link
Contributor

Acceptance criteria

  • Unrecognized config names raise a warning
  • Unrecognized property names raise a warning
    • In dbt, properties are declared in .yml files, in the same directory as your resources, and could be named schema.yml, _properties.yml, whatever_you_want.yml, etc.
  • document how to turn on a "strict mode" for unrecognized config/property names that converts relevant warnings to error messages

For refinement

  • there is a method to add certain names to an "allow list" that bypasses a warning message for those particular names
  • there is some kind of string distance metric (like Levenshtein distance, etc.) used to emit a warning like "Did you mean materialized instead of materialization?"
  • warnings vs. errors

We may not want to raise errors at this time since that would increase the risk of introducing breaking changes to projects that are currently working (even if they have been misconfigured in an invisible way). Raising a warning is a more gentle on-ramp than jumping all the way to raising an error. The trade-off is that warnings are less visible than errors and may be overlooked. If we choose to go the "warnings only" route, we can still reserve the right to elevate this all the way to an error at some point in the future.

@dbeatty10 dbeatty10 added the Refinement Maintainer input needed label Dec 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request paper_cut A small change that impacts lots of users in their day-to-day Refinement Maintainer input needed
Projects
None yet
Development

No branches or pull requests

6 participants