Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3080] [Bug] Improve Error Messages for Grants #8563

Open
2 tasks done
callum-mcdata opened this issue Sep 6, 2023 · 4 comments
Open
2 tasks done

[CT-3080] [Bug] Improve Error Messages for Grants #8563

callum-mcdata opened this issue Sep 6, 2023 · 4 comments
Assignees
Labels
enhancement New feature or request grants Issues related to dbt's grants functionality paper_cut A small change that impacts lots of users in their day-to-day

Comments

@callum-mcdata
Copy link
Contributor

callum-mcdata commented Sep 6, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Right now, grants display as a database error for the model being run. This makes it very difficult to debug if you are unaware of the full grant logic!

14:09:27  Database Error in model model_name (models/folder/folder/folder/model_name.sql)
14:09:27    user "analyst" does not exist

This may be running in a single transaction but it would be amazing if we could have the error message display that it was from a grant.

Expected Behavior

I would love if we could have something more like this:

14:09:27  Database Error in model model_name (models/folder/folder/folder/model_name.sql)
14:09:27   GRANT: user "analyst" does not exist

Just a simple designation that the error message is stemming from the grant statement.

Steps To Reproduce

In redshift create a group called analyst but not a user called analyst. Run any dbt model with the following configuration.

models:
  +grants:
    select: ["analyst"]

However this works just fine

models:
  +grants:
    select: ["group analyst"]

Relevant log output

No response

Environment

- OS:MacOS
- Python: 3.11
- dbt: 1.6.1

Which database adapter are you using with dbt?

redshift

Additional Context

Hi 👋 . Not super high priority but just a little bit of pain that took me a few hours to figure out what was going on. Figured I'd flag to y'all

@callum-mcdata callum-mcdata added bug Something isn't working triage labels Sep 6, 2023
@github-actions github-actions bot changed the title [Bug] Improve Error Messages for Grants [CT-3080] [Bug] Improve Error Messages for Grants Sep 6, 2023
@dbeatty10
Copy link
Contributor

@callum-mcdata, my friend, thanks for raising this!

Do you think this could be at least partially relieved if dbt-labs/dbt-redshift#415 is resolved so that dbt-redshift can use groups or roles in grants (in addition to users)?

@callum-mcdata
Copy link
Contributor Author

❤️ @dbeatty10 .

I guess it depends on how groups are handled because groups actually work just fine in grants! You just need to put group before the name of the object, which really isn't that difficult but of course opens the door up to user error.

Example:

models:
  +grants:
    select: ["group analyst"]

I think an ideal world would have the error message identify where the real issue is. If I had seen GRANT: error message I would know exactly what part was failing out. I do know that might be a bit of a pipe dream though, given how transactions work.

@graciegoheen graciegoheen added enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors paper_cut A small change that impacts lots of users in their day-to-day and removed bug Something isn't working triage help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors labels Sep 6, 2023
@dbeatty10
Copy link
Contributor

@callum-mcdata After discussing this with @graciegoheen and @jtcohen6, we had the following idea:

A more general idea

What if the messages for Database Errors also included the SQL that caused the error?

14:09:27  Database Error in model model_name (models/folder/folder/folder/model_name.sql)
14:09:27    user "analyst" does not exist
14:09:27  The error was raised while executing the following SQL:
grant select on database.model_name to analyst;

Trickiest design element

The trickiest part from the user interface perspective is determining how to handle long multi-line SQL statements:

  1. Print the whole thing, even if it is 500 lines long? Doesn't sound fun.
  2. Print out the first X lines? Likely to exclude the relevant lines.
  3. Try to intelligently determine the relevant sections of the SQL that caused the error? Sounds complicated.

Potential sweet spot

We could try to hit a sweet spot between utility and simplicity of implementation by truncating to only include the first 5 lines and offer instructions to check the logs like this:

14:09:27  Database Error in model model_name (models/folder/folder/folder/really_complicated_model.sql)
14:09:27    column "mispelled_column" does not exist
14:09:27  The error was raised while executing the following SQL:
create or replace table orders as
with customer_orders as (
    select
        customer_id,
        min(order_date) as first_order_date,
...
14:09:27  See logs/dbt.log for the full SQL that preceded the error.

Broader context

There's three* main cases where we expect errors to be raised when running dbt subcommands and the user may want to inspect the relevant SQL:

  1. ✅ A test fails when running dbt test → The error message includes a link to the relevant SQL file within the target directory.
  2. ✅ A model fails to build when running dbt build / run → The error message includes a link to the relevant SQL file within the target directory.
  3. ❌ Some supporting non-model SQL (like a grant) raises a Database Error → The error message doesn't include a link to the relevant SQL.

* Unit testing will be a 4th case.

One more idea

What if we introduced a new subfolder within the target directory that is reserved for storing SQL related to Database Errors so that we can provide a link to the user without them needing to search through the logs?

14:14:17   Database Error in model model_name (models/folder/folder/folder/model_name.sql)
14:14:17  
14:14:17    compiled Code at target/error/bab9854b739f5f19bc7cf77e27934d25.sql
14:14:17  
14:14:17  Done. PASS=1 WARN=0 ERROR=1 SKIP=0 TOTAL=2

To make the file names unique, we could just take a MD5 hash of the SQL contents (which is where those 32-characters in the file name above come from).

I opted for a shorter pathname in the example above, but it could also be more verbose like this:

14:14:17    compiled Code at target/error/my_project/models/folder/folder/folder/model_name/bab9854b739f5f19bc7cf77e27934d25.sql

@soksamnanglim
Copy link

soksamnanglim commented Sep 8, 2023

Hi @callum-mcdata, thanks for raising this issue!

I think raising a Database Error with the perpetrating sql statement is a great idea 🙂 @dbeatty10

Stumbled across this while investigating support for role grants (and perhaps group grants) through configurations and I am wondering if these approaches would interest you at all since it seems both a dbt-redshift bug/feature request? (Also should this be part of dbt-labs/dbt-redshift#415

My understanding of the issue...

  1. Documentation says only user grants are supported right now. <- This may not be common knowledge for users.
  2. The error raised does not provide enough context "user <group/role> does not exist". The root cause of this issue is that Redshift SQL syntax requires GRANT <privilege> TO GROUP <group-name>, while the current approach only supports GRANT <privilege> TO <user>.
  3. Current implementation actually supports a workaround (preface group-name with group). Like @callum-mcdata mentioned, there is a caveat: higher incidence of user error.

Here are my ideas for addressing the problem: they range from mitigation to drastic changes.

simplest idea - mitigation

This is the easiest fix to ship out and would require updating dbt documentation to say

  1. you may configure grants to groups by putting group before the object
  2. dbt supports groups.
    For the latter though, we may have to do testing to ensure this workaround is bullet proof(one of the cases I'm thinking of is if users mixed-type grantees "+select" : ['analytics_user', 'group data_science'].

Is this enough? If no, would a modification to DatabaseError as @dbeatty10 suggested be helpful in conjunction with this?

alternative idea - segment grants into grant_roles, grant_users, grant_groups

models:
   - name: example_model
         config:
            grants:
               grant_group:
                   privilege: group_name
               grant_role:
                   role_name: user_name
               grant_user:
                  privilege: user_name

This approach would reduce user error, however has many significant, concerning caveats...

  1. likely breaks all existing dbt projects that are granting privileges via config
  2. deviates drastically from dbt's standardized approach for all data warehouses
  3. much more complicated

I am not inclined towards this approach mainly due to the first two reasons above, but would love to discuss and know your thoughts. happy weekend 🥳 !

@dbeatty10 dbeatty10 added the grants Issues related to dbt's grants functionality label Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request grants Issues related to dbt's grants functionality paper_cut A small change that impacts lots of users in their day-to-day
Projects
None yet
Development

No branches or pull requests

4 participants