Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-99] [Spike] Enable loading dbt_project config outside of task structure #4630

Closed
gshank opened this issue Jan 27, 2022 · 3 comments
Closed
Assignees
Labels
spike stale Issues that have gone stale

Comments

@gshank
Copy link
Contributor

gshank commented Jan 27, 2022

There is some information in the dbt_project.yml file which it would be good to have access to prior to creating a particular task. In addition, being able to separate out project object creation would be useful for testing purposes, for both changes to project files and for constructing fixtures for tests.

There are two Project classes. One is in core/dbt/config/project.py and one is in core/dbt/contracts/project.py, which is a bit confusing. The core/dbt/config/project.py imports the contracts project class as ProjectContract. Roughly speaking the contract project matches to the yaml file, and the config project is used to create the RuntimeConfig.

One of the non-optimal side effects of this is that when a field is added to the dbt_project.yml file, fields have to be added both in the two Project classes and the in the multiple places where fields are copied from contract project to config project, to RuntimeConfig.

Investigate the feasibility of building the contracts Project and the config Project prior to constructing tasks.

@github-actions github-actions bot changed the title Enable loading dbt_project config outside of task structure [CT-99] Enable loading dbt_project config outside of task structure Jan 27, 2022
@leahwicz leahwicz added the spike label Jan 27, 2022
@leahwicz leahwicz changed the title [CT-99] Enable loading dbt_project config outside of task structure [CT-99] [Spike] Enable loading dbt_project config outside of task structure Feb 28, 2022
@iknox-fa
Copy link
Contributor

Spike: project configs before task runs

In order to access project data earlier in the runtime process dbt needs to be able to generate a ProjectConfig object before starting the task.

TL;DR

I can envision a few different ways to meet his need with varying degrees of ease and correctness

  • Method 1: Duplicate all the logic needed to generate project configs and isolate it without any of the Task related specificity.
    • Pros: Fast and easy-ish.
    • Cons: Meh at best, not DRY, not easy to maintain, adds more complexity.
  • Method 2: Thread together the various class methods needed to generate the project config. Only the basic flow code
  • Method 3: Create the project config (and the profile config) at the outset, then pass them into the selected Task. will require doing quite a bit of fussing with the cli and Task code.
    • Pros: Would greatly improve our Task orchestration code's readability and simplicity, Would make testing much easier.
    • Cons: Not easy

Research:

Task dependency graph

image

Notes on Pre-task configs, tracing through the codebase to generation

Arguments

From command line

  • main.py:
    • sys.argv ->
    • main() ->
    • handle_and_check() ->
    • parse_args() ->
      • Config: parsed_args <class 'argparse.Namespace'>

Flags

Partially depends on parsed_args @ core/dbt/flags.py:102

  • main.py:
  • flags.py: (imported)
    • Config: flags (global module w module constants)

Contents of user profile

Read multiple times, this is the first one
Depends on flags

  • main.py:
    • handle_and_check() ->
  • config/profile.py:
    • read_user_config() ->
      • Config: user_config <class 'dbt.contracts.project.UserConfig'>

The three configs above are passed into each task. They also determine which task is called
@ core/dbt/main.py:214

Task configs

Runtime / UnsetProfile configs

  • task/base.py:
    • ConfiguredTask.from_args() ->
    • move_to_nearest_project_dir() -> (dir structure and CWD as config?!)
    • BaseTask.from_args() ->
  • core/dbt/task/any_given_class.py:
    • ConfigType.from_args() ->
  • core/dbt/config/runtime.py:
    • RuntimeConfig.from_args() ->
    • RuntimeConfig.collect_parts() ->
      • Config: project <class 'dbt.config.project.Project'>
      • Config: profile <class 'dbt.config.profile.Profile'>
    • RuntimeConfig.from_parts() ->
      • Config: retval <class 'dbt.config.runtime.RuntimeConfig'>
        OR
    • UnsetProfileConfig.from_args() ->
    • RuntimeConfig.collect_parts() ->
      • Config: project <class 'dbt.config.project.Project'>
      • Config: profile <class 'dbt.config.profile.Profile'>
    • UnsetProfileConfig.from_args() ->
      • Config: retval <class 'dbt.config.runtime.UnsetProfileConfig'>

Tasks require either RuntimeConfig or UnsetProfileConfig, both of which are created from a
Project, Profile, and args (excepting a few that don't require config data at all, see diagram)

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Aug 28, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2022

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

@github-actions github-actions bot closed this as completed Sep 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spike stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

3 participants