-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-777] Default to current working directory for profiles.yml
#5411
Comments
profiles.yml
profiles.yml
@dbeatty10 this is a great write-up!!! Would you want to add this to your PR as a doc in our ADR directory? That way we can preserve this decision in the code. Also discussion can take place in the PR if anyone has questions on parts of this. I can help you if you have questions on the format but you are pretty close to having every section already taken care of here |
Thanks for the amazing and thorough write-up @dbeatty10, and for spearheading the team-wide conversation since opening. Excited for this to be the first of many such UX improvements :) |
A safety comment, only half-baked as I have only skimmed the backstory of this. I love batteries included experiences for quickstarts. Personally, I then take whatever I was given and modify it to build what I'm actually trying to do. I'm worried that by shipping something like the duck-db-in-a-minute project – which will hopefully be the first experience many people have with dbt – with a profiles.yml inside of a source-controlled folder, someone who takes it and reuses it for their cloud DWH connection might do the same thing. Suddenly we have people checking passwords into GitHub and the hackers are mining bitcoin inside of a JavaScript UDF on a 6XL Snowflake warehouse. Should we consider an additional property in profiles.yml, along the lines of dangerouslySetInnerHTML?
Then when dbt is invoked, it raises an exception if that key isn't present when the profile comes from the current working directory? For backwards compatibility issues, we'd perhaps have to also treat use of |
@jtcohen6 @dbeatty10 ☝️ in case your GH notifications only alert you for thing you're tagged in. (It is also ok if you were just ignoring me and hoping I'd go away 😉) |
TL;DR🙏 Don't check secrets into version control Current contextCurrently, users can specify the location of a To enable separation of configurable and/or secret content, env_var and/or "secret" env vars to the rescue. Crucial skillsCrucial skills2 every git committer should possess:
e.g., recognize secrets and don't check them into your version control system. This is relevant regardless of programing language or library. Conversely, if a user doesn't yet have security wherewithal, they will have problems with systems well beyond dbt4. ExampleYou provided a good example why secret hygiene for git is the crucial move. Imagine that the
To get the provided example to work, the user will need to update it to
Hints within
|
Hi, I've just discovered that part of the feature wasn't actually implemented:
and that's pretty disappointing, because we miss that envar for such a long time... Is there any chance to retrieve that idea? |
same here. curious is there are any plans to support DBT_PROJECT_DIR anytime soon? |
You're right! I think that may have been missed in the shuffle here. I'm going to open a new issue to reflect that change. Update: #6078 |
DBT_PROJECT_DIR
environment variable~/.dbt/profiles.yml
~/.dbt/
dbt_project.yml
python-dotenv
direnv
Summary of the feature
Look for
profiles.yml
within the current working directory first. Fall back to the~/.dbt/
directory.I will submit an experimental/draft PR that shows how this could be implemented.
Who will this benefit?
Instigating use-case
The instigating use-case:
Embedded databases like DuckDB and SQLite can utilize the same compute resources as
dbt-core
+dbt-{adapter}
. In the case of non-sensitive data,profiles.yml
defaulting to the current working directory would enable projects to work without mucking with environment variables.Summary of proposal for unified conventions
This proposal would align conventions the order of search precedence for the
profiles.yml
anddbt_project.yml
directories:Pros
profiles.yml
in the current working directory if a centralized config in~/.dbt/
is desired instead.It also supports this exotic option:
profiles.yml
does need to contain plain-text secrets for some reason, you can still safely check it into version control using a tool like BlackBox 🤯Cons
profiles.yml
in the project root.profiles.yml
is non-functional / undesired (which feels unanticipated and unlikely)profiles.yml
is most likely utilized anyways by the project viaDBT_PROJECT_DIR
or by copying into~/.dbt/
Background context
dbt has a solid foundation of convention over configuration (CoC), and this proposal would lean into this further.
Current behavior
dbt needs a
profiles.yml
configuration file for database connection info. I believe the current order of precedence of ... is:--profiles-dir
optionDBT_PROFILES_DIR
environment variable~/.dbt/
directorydbt also needs a
dbt_project.yml
. The current order of precedence is:--project-dir
optionDesired behavior
Search order for
profiles.yml
:--profiles-dir
optionDBT_PROFILES_DIR
environment variable~/.dbt/
directorySearch order for
dbt_project.yml
:--project-dir
optionDBT_PROJECT_DIR
environment variable (NEW)General design requirements
There's two necessary pieces for dbt to use a profile to connect to a target database:
There are two main design requirements in terms of discoverability and accessibility:
Approach 1
A reason given for the current order of precedence (emphasis mine):
Using a
~/.dbt/profiles.yml
file is a solution that:Pros
Cons
profiles.yml
fileApproach 2
sample.profiles.yml
(or justprofiles.yml
)test.env.example
for local developmentThis requires doing all of the following for local development:
sample.profiles.yml
file intoprofiles.yml
(within the desired profiles directory)DBT_PROJECT_DIR
environment variable or--profiles-dir
command-line interface (CLI) flag if profiles directory is different than~/.dbt/
test.env.example
file totest.env
test.env
Pros
test.env.example
Cons
profiles.yml
somehow (~/.dbt/
orDBT_PROJECT_DIR
or--profiles-dir
)Alternatives considered
DBT_PROJECT_DIR
environment variable~/.dbt/profiles.yml
~/.dbt/
dbt_project.yml
(similar toseed-paths
)python-dotenv
direnv
DBT_PROJECT_DIR
environment variableThe most straight-forward solution to this currently is to just set the
DBT_PROJECT_DIR
environment variable to the root of the project (or some subdirectory).Pros
Cons
DBT_PROJECT_DIR
when switching to a different project that doesn't have profiles.yml in the current working directory.Curated personal
~/.dbt/profiles.yml
Pros
Cons
~/.dbt/profiles.yml
will surely be managed differently than the CI and production versions of the same file (I. Codebase and X. Dev/prod parity)In priority behind
~/.dbt/
Pros
Cons
Add a setting in
dbt_project.yml
Pros
Cons
profiles.yml
untildbt_project.yml
was found and parsed.python-dotenv
Pros
profiles.yml
in theDBT_PROJECT_DIR
environment variableCons
direnv
Pros
direnv
knows to unload variables when switching directoriesCons
direnv allow
the first time it is executed for a directory, and re-running it everytime the.envrc
file is updatedDocker
Pros
DBT_PROJECT_DIR
Cons
I do think that analytics engineers should get comfortable with manually loading/unloading environment variables, using Docker images, and even loading/unloading environment variables with tools like
direnv
andpython-dotenv
. But it's preferable to minimize additional non-Python dependencies (to the extent possible).The text was updated successfully, but these errors were encountered: