Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: single terragrunt.hcl per environment? #759

Open
brikis98 opened this issue Jun 19, 2019 · 17 comments
Open

RFC: single terragrunt.hcl per environment? #759

brikis98 opened this issue Jun 19, 2019 · 17 comments
Labels
enhancement New feature or request

Comments

@brikis98
Copy link
Member

brikis98 commented Jun 19, 2019

Current state

The current practice for using Terragrunt is to create one folder for each module and put a terragrunt.hcl file in it. You also have one terragrunt.hcl at the root of each environment with shared configurations for that environment. The folder structure looks like this:

staging
├── terragrunt.hcl
├── frontend-app
│   └── terragrunt.hcl
├── mysql
│   └── terragrunt.hcl
└── vpc
    └── terragrunt.hcl

production
├── terragrunt.hcl
├── frontend-app
│   └── terragrunt.hcl
├── mysql
│   └── terragrunt.hcl
└── vpc
    └── terragrunt.hcl

New proposal

One thing I've been contemplating is if it makes sense to instead aim for the ability to have one terragrunt.hcl per environment so that the folder structure looks like this:

staging
└── terragrunt.hcl
production
└── terragrunt.hcl

To make this work, terragrunt.hcl would allow you to define the configuration for multiple modules in a single file. The code would look almost exactly like normal Terraform syntax. Here's a rough sketch of what that might look like:

# You can include one or more modules using a syntax nearly identical to Terraform
module "frontend_app" {
  # Use the source param to specify where the code for this module lives
  source = "github.com/acme/modules.git//frontend-app?ref=v0.3.4"
  
  # Specify input parameters for that module the same as how Terraform does it!
  instance_type = "t2.micro"
  instance_count = 10

  # You can reference outputs from other modules using the normal Terraform syntax!
  # This also defines dependencies between modules for apply-all style commands.
  vpc_id = module.vpc.id
  db_url = module.mysql.endpoint
}

module "mysql" {
  # Each source can point to different module and different versions
  source = "github.com/acme/modules.git//mysql?ref=v0.4.2"

  # You can reference shared inputs defined in this file below
  name = "${local.env_name}-mysql"

  memory = 4096
  disk_space = 250

  vpc_id = module.vpc.id
}

module "vpc" {
  source = "github.com/acme/modules.git//mysql?ref=v0.4.2"

  name = local.env_name
  cidr_block = "10.0.0.0/16"
}

# Local variables you can share amongst all the modules above
locals = {
  env_name = "staging"
}

# Backend configuration applied to all the modules above
remote_state {
  backend = "s3"
  config {
    bucket = "foo"
    region = "us-east-1"
    # module_name returns the name of the current module (e.g., frontend-app, mysql, etc)
    key = "${module_name()}/terraform.tfstate"
  }
}

Note that, instead of a single (potentially massive) .hcl file per environment, we could support loading all .hcl files in a folder, just like Terraform loads all .tf files in a folder. This would allow you to break up your definition into a few smaller files:

staging
├── production
│   ├── apps.hcl
│   ├── data-stores.hcl
│   └── networking.hcl
└── staging
    ├── apps.hcl
    ├── data-stores.hcl
    └── networking.hcl

Inside of apps.hcl you could have all your app modules (e.g., module.frontend_app from the code snippet above), inside data-stores.hcl you could have all your data stores (e.g., module.mysql from the code snippet above), and so on. The key is that all the references between modules and usage of local variables would work just as if everything was in one big file (as in the original code snippet above).

Example usage

Deploy the VPC module:

$ terragrunt apply --terragrunt-module vpc

This will check out the code for the VPC module into a temp dir, run terraform apply, and pass it the input variables you set.

Deploy the VPC and mysql modules:

$ terragrunt apply --terragrunt-module vpc --terragrunt-module mysql

This will check out both modules into temp dirs and run apply in each one, but since one module depends on the other, it will do so in the right order (vpc first, then mysql).

Deploy all modules:

$ terragrunt apply-all

This will check out all modules into temp dirs and run apply on all of them, again respecting the dependency order.

Advantages

  • Just one file (or a small number of files) per environment to manage instead of dozens or hundreds of files and folders
  • Easier to enforce dependencies
  • Easier to share variables
  • Easier to share backend configuration
  • Unlike writing the same code in one big .tf file, you get separate state files and can have separate permissions for each module, reducing the blast radius of errors

Disadvantages

  • You no longer get separation between modules by being in different folders (though environments are still in different folders), but only through the commands you run.

Other thoughts

If you really think about it, this is very close to defining all of the exact same infrastructure in a single .tf file (or several .tf files in the same folder) with pure Terraform (no Terragrunt), and running terraform apply -target=module.<MODULE>? You could then apply changes to just one module at a time, with affecting any of the others, and get full code reuse, dependency management, etc from Terraform natively.

However, the downsides of doing this with pure Terraform are:

  1. All the state is stored in one state file, so if you mess up anywhere, you could break everything.
  2. Even with -target=module.<MODULE>, I believe Terraform applies not just MODULE, but all of the dependencies of <MODULE> too.
  3. Does the state and all data sources have to be updated for everything, even if you're only targeting a subset of the modules? If so, plan and apply would be very slow.
  4. Permissions management is crappy, as you'd have to have permissions to at least read everything, even if you're only updating a subset of the data.

But still... It's so close!

@brikis98 brikis98 added enhancement New feature or request help wanted labels Jun 19, 2019
@jakauppila
Copy link
Contributor

Before I begin, I'll throw in a caveat that I'm working in an enterprise trying to balance our Infrastructure as Code for ~15 business units and ~500 systems containing ~1500 applications in a centralized cloud team.

We are also currently still on v0.11.13 so I haven't had the chance to consider exactly how the v0.12 changes affect us both from a Terraform perspective and the v0.19 Terragrunt changes. Terraform not allowing passing variables not in use will likely require us to refactor some things.

Our usage of Terragrunt is a layered approach, which I'll attempt to break down:

  1. Build our Enterprise-wide shared services that all accounts will utilize
    • Transit Gateway(s) that hook back to our Direct Connect Gateway to our edge and back on-prem
    • AWS Endpoints VPC that will route resolution to for any AWS services
  2. Build each business unit a VPC in their own account (split by both unit and nonprod/prod)
    • Includes building the VPC, subnets, 'default' security groups that have been vetted and approved by IT and Infosec

That's the first core usage, our current plan is that it all lives within a single repository and churn of configuration here will be minimal.

The repository currently looks like this:

terraform
├── terraform.tfvars
├── shared-services
|  ├── nonprod
|  |  ├── environment-class.tfvars
|  |  ├── information-technology
|  |  |  ├── account.tfvars
|  |  |  ├── _global
|  |  |  |  └── iam
|  |  |  |     └── terraform.tfvars
|  |  |  ├── us-east-2
|  |  |     ├── region.tfvars
|  |  |     └── endpoints
|  |  |        └── terraform.tfvars
|  |  |     └── transit-gateway
|  |  |        └── terraform.tfvars
|  |  ├── information-security
|  ├── prod
├── business-units
|  ├── nonprod
|  |  ├── environment-class.tfvars
|  |  ├── information-technology
|  |  |  ├── account.tfvars
|  |  |  ├── _global
|  |  |  ├── us-east-2
|  |  |     └── nonprod
|  |  |        └── terraform.tfvars
|  |  ├── <unit2>
|  |  ├── <unit3>
|  |  ├── <unit4>
|  |  ├── <unit5>
|  |  ├── <unit6>
|  |  ├── <unit7>
|  |  ├── <unit8>
|  |  ├── <unit9>
|  |  ├── <unit10>
|  |  ├── <unit11>
|  |  ├── <unit12>
|  |  ├── <unit13>
|  |  ├── <unit14>
|  |  ├── <unit15>
|  └── prod
|     ├── <unit1>
|     ├── <unit2>
|     ├── <unit3>
|     ├── <unit4>
|     ├── <unit5>
|     ├── <unit6>
|     ├── <unit7>
|     ├── <unit8>
|     ├── <unit9>
|     ├── <unit10>
|     ├── <unit11>
|     ├── <unit12>
|     ├── <unit13>
|     ├── <unit14>
|     ├── <unit15>
├── isolated-systems
|  ├── nonprod
|  |  ├── <isolatedsystem1>
|  |  └── <isolatedsystem2>
|  └── prod
|     ├── <isolatedsystem1>
|     └── <isolatedsystem2>
└── sandboxes
   └── developersandbox1

With this setup, we are creating an Infrastructure Module for the following pieces:

  • Shared services
  • Business Units
  • Isolated Systems
  • Sandboxes

Within those Infrastructure Modules we are implementing our Core Modules which describe how we implement resources. The Infrastructure Modules for each of the individual business units will be quite simple, at least to begin with, most just creating a VPC. Since that is all in Git, it allows us to simply pass in only the changes required between environments. It is this portion that our usage likely differs than most, electing to call each module from Terragrunt rather than building the Infrastructure Modules.

After all of that is complete, we will have another set of Terraform configs (likely structured very similarly) that will just take an account and VPC ID and will lookup various resources (subnets/security groups) via a defined contract of tags and build out that particular Systems configuration (servers, ALB, IAM roles, S3 buckets, etc).

There is a whole lot of connectivity (as well as resource sharing on the DB side to reduce cost) required between systems of a business unit and on-prem which is why each system is not it's own 'stack'.

With all that said, I probably didn't address what you're asking.

@yorinasub17
Copy link
Contributor

I am generally against using commandline arguments for targeting resources to deploy in a routine scenario, primarily because it is so easy to use Ctrl+R or up arrow in the history to accidentally apply the wrong thing. One of the nice things about the current terragrunt folder structure implementation is that when you are iterating, it is relatively hard to accidentally apply the wrong plan because you are unlikely to be in that folder.

@terozio
Copy link

terozio commented Jun 20, 2019

Sounds a bit scary to me. The individual tfvars lets various teams in our organisation work on different aspects the infrastructure in nice isolation and there is low barrier of entry for newcomers when you are dealing with a single thing at the time. We also have a bunch tfvars generated by build automation and self-services (like deploying a new fargate service or creating a new rds cluster etc), that sounds very problematic if stuff would need to be appended to single shared file.

Also how do you organise everything in that single file in a way that makes sense when you have hundreds or thousands of resources to manage? I think the possibility to use normal directory structures provides nice way of grouping relevant things and makes them easy to find.

Personally in general I dislike massive configuration files and prefer a larger number of well structured smaller files. But I guess thats a bit like the mono repo discussion, everyone has their opinion and theres multiple sides to it :)

@kelsmj
Copy link

kelsmj commented Jun 25, 2019

I prefer the current structure of different folders for prod/staging etc with sub folders underneath for each module. It makes it easier to digest what is being used for a given environment by just looking at the folder structure.

@soneil123
Copy link

@brikis98 This may have been mentioned elsewhere (or possible and I can't figure it out) but why don't you recursively include all terragrunt.hcl files? If you did, I wouldn't have to move my state files from environments/development to nonprod/development and declare my remote_state multiple times.

terraform
|-- environments
|   |-- development
|   |   |-- frontend
|   |   |   `-- terragrunt.hcl
|   |   |-- mysql
|   |   |   `-- terragrunt.hcl
|   |   `-- terragrunt.hcl  (inputs {} only)
|   |-- production
|   |   |-- frontend
|   |   |   `-- terragrunt.hcl
|   |   |-- mysql
|   |   |   `-- terragrunt.hcl
|   |   `-- terragrunt.hcl (inputs {} only)
|   `-- staging
|       |-- frontend
|       |   `-- terragrunt.hcl
|       |-- mysql
|       |   `-- terragrunt.hcl
|       `-- terragrunt.hcl (inputs {} only)
|-- infra
|   |-- nonprod
|   |   |-- s3
|   |   |   `-- terragrunt.hcl
|   |   `-- terragrunt.hcl (inputs {} only)
|   `-- prod
|       |-- s3
|       |   `-- terragrunt.hcl
|       `-- terragrunt.hcl (inputs {} only)
`-- terragrunt.hcl (remote_state declaration and global vars)

This allows the flexibility to define variables at any hierarchy level.

@jfunnell
Copy link

jfunnell commented Jun 28, 2019

I think this is awesome, personally. Everyone Some people on our team were apprehensive of using terragrunt because of how nested the structure gets, compared to our hand-rolled terraform helper that simply selected a tfvars file depending on your environment/workspace. This suggestion is a very welcome middle ground -- and like you said, it very much looks like vanilla terraform, which is a good thing.

That said if you're worried about the file getting too large, why not allow loading multiple .hcl files in a directory? That way people can still break up the module configs however they want, without needing to make a bunch of subdirectories.

@brikis98
Copy link
Member Author

That said if you're worried about the file getting too large, why not allow loading multiple .hcl files in a directory? That way people can still break up the module configs however they want, without needing to make a bunch of subdirectories.

Yup! I mentioned just that in the original description:

The one file could become massive. Perhaps we'd instead need to load all *-terragrunt.hcl files, similar to how Terraform loads all *.tf files?

@singularit-io
Copy link

This feature in itself is truly either an enabler or a blocker to my process of moving from our current standard terraform repos to terragrunt empowered repos. There is no way we will start segregating our state files by using the current terragrunt modular approach, however we obviously want to move away from the main.tf modules approach which implies so much unnecessary code duplication to reach the same result that we have right now in terms of state file (our safety net at the moment is whether the planned state file is unchanged or not, following a refactor of our code base to make it much prettier with no duplication at all and a modular approach).

Also, it would be a great plus to completely avoid the provider, template, and terraform versions duplication between modules, and be able to apply those dynamically from the central hcl file to all modules used in a stack as required, in the same fashion that we declare the state file backend generically. I am not sure if any or all of these items can be achieved with current terragrunt, so far I have only seen duplicate declarations of provider and TF version requirements in the modules examples.

@Bryksin
Copy link

Bryksin commented Nov 21, 2019

I would prefer to have completely single .hcl file which will be dynamically resolved based on variables

for example:
set environment variable TF_VAR_ENV=["dev" | "qa" | "prod"]

variables.tf

# Empty, injected from TF_VAR_ENV
variable ENV {}

variable AWS_REGION {
    default = {
        dev = "eu-west-1"
        qa = "<some_region>"
        prod = "<some_region>"
    }
}

terragrunt.hcl

remote_state {
  backend = "s3"
  config = {
    bucket         = "<your-bucket-name>-terraform-state-${var.ENV}"
    key            = "${path_relative_to_include()}/terraform.tfstate"
    region         = var.AWS_REGION[var.ENV]
    encrypt        = true
    dynamodb_table = "terraform-lock-table"
  }
}

@ozbillwang
Copy link

ozbillwang commented Nov 27, 2019

I 'd like to implement the stack based terragrunt (no need environment based)

For example, I need create one application which need reference several modules. But in the same environment, I may have several other applications to be managed as well

So the terragrunt.hcl should support as below way:

terragrunt "module_1"  {         # < = that's the key part to support multiple modules.
  source = "<git_module_1_path>"

  include {
    path = find_in_parent_folders()
  }

  inputs = {
    name = "module_1"
  }
}

terragrunt "module_2"  {
  source = "<git_module_2_path>"

  dependency "module_1" {
    config_path = "."
  }
  include {
    path = find_in_parent_folders()
  }

  inputs = {
    name = "module_2"
  }
}

I have mentioned the same idea in #957

so nothing else need be changed. We just need support terragrunt "module_2" {} more than terragrunt {} . The rest should be exact same.

@brikis98
Copy link
Member Author

One more thought on this RFC: if you take a step back and squint at it, I'm basically arguing for turning Terragrunt into a preprocessor for Terraform. This is similar to how SASS and Less are preprocessors for CSS code.

Perhaps the way to think of it is that users write normal Terraform code, perhaps even in normal .tf files

.
├── production
│   ├── main.tf
│   ├── outputs.tf
│   └── variables.tf
└── staging
    ├── main.tf
    ├── outputs.tf
    └── variables.tf

The code in those files will look almost like 100% vanilla Terraform code:

# Use normal Terraform syntax to define modules
module "frontend_app" {
  # Use the source param to specify where the code for this module lives
  source = "github.com/acme/modules.git//frontend-app?ref=v0.3.4"
  
  # Specify input parameters the usual way
  instance_type = "t2.micro"
  instance_count = 10

  # You can reference outputs from other modules the usual way
  vpc_id = module.vpc.id
  db_url = module.mysql.endpoint
}

module "mysql" {
  # Each source can point to different module and different versions
  source = "github.com/acme/modules.git//mysql?ref=v0.4.2"

  # You can reference local variables the usual way
  name = "${local.env_name}-mysql"

  memory = 4096
  disk_space = 250

  vpc_id = module.vpc.id
}

module "vpc" {
  source = "github.com/acme/modules.git//mysql?ref=v0.4.2"

  name = local.env_name
  cidr_block = "10.0.0.0/16"
}

# The normal way Terraform handles local variables
locals = {
  env_name = "staging"
}

# Backend configuration defined the usual way
terraform {
  backend "s3" {
    bucket = "foo"
    region = "us-east-1"
    # The terragrunt_module_name() helper is the only Terragrunt-specific thing here
    key = "${terragrunt_module_name()}/terraform.tfstate"
  }
}

When you run terragrunt apply --terragrunt-module vpc, Terragrunt "preprocesses" your Terraform code by rewriting it into a temporary folder where each module is defined in a separate folder with abackend config with a unique key value, and then runs terraform init and terraform apply in one of those folders.

In other words, it's doing what we think Terraform should do natively, on top of more or less normal Terraform code.

@kzap
Copy link

kzap commented Sep 5, 2020

A pre-processor would be the perfect solution for Terraform and Terragrunt can handle dependency management.

Right now terragrunt seems to be solving for 3 things:

  • Generating Terraform Config with less code
  • Adding new HCL functions that solve common problems
  • Dependency management and ordering.

Those could be 3 separate tools.

@antgel
Copy link

antgel commented Sep 5, 2020

About less code, that does sometimes make me think. I'm relatively new as a serious Terraformer / Terragrunter, but let's say I have similar environments across several regions, and I change something common like a module input variable. I have to go through all my terragrunt.hcls e.g.

sandbox/us-east-1/some-module/terragrunt.hcl
prod/eu-central-1/some-module/terragrunt.hcl
# (another few of those)

and make the exact same change e.g. my application module now outputs load_balancer_url instead of endpoint_url.

I also have to go into sandbox/us-east-1, prod-eu-central-1, and every other directory to run terragrunt apply-all.

This grates, but is still better than using vanilla Terraform or manual configuration.

Are these things that resonate, or have I missed something (even something obvious) which is quite possible because there's a lot to learn? If so, happy to be pointed in the right direction.

@jfunnell
Copy link

jfunnell commented Sep 7, 2020

I believe the answer to your question is 'automation'. As in some kind of CI setup if you feel that you are safe to deploy everything at once. I can't imagine you actually want to deploy everything at the same time without some kind of promotion strategy though, as the larger your environments become, the more fragile they can be.

I think you want to have a CI tool that applies your stacks 'constantly' (ie: every time they change), and you should be using module references (tags or a commit SHA) to ensure you have your terragrunt files correctly updated, before then committing changes to each environment one at a time and letting the CI tool roll the changes out.

e: And I don't really think this proposal would really change your specific issue? It might make some shared code easier to share, I guess.

@keisari-ch
Copy link

keisari-ch commented Nov 9, 2020

Hello,

I was rethinking the way i'm managing a whole GCP infrastructure, and was asking myself of how to use additional layers of modules, and this looks promising.

I have a HCL hierarchy, which represents all the resources deployed accross all the projects and folders of my organization.

Each one of the folders will host a HCL file calling for a module. These modules, for now, regroups all the resources needed for this particular folder and subsequent project.

Obvisouly, continuing that way, will not be DRY compliant at all.

So i was thinking of how i could manage that :
any folder with .hcl -> source module (service definition) -> module (resource blocks to answer the service needs)

And the RFC looks pretty nice, because it would bring the services composition to the first layer.

Do you have any update on this ?

@thnee
Copy link

thnee commented Feb 24, 2021

It's a great idea to be able to run multiple modules from one target!

I would like to request another closely related use case, the ability to use a module, and to have some "one-off" terraform code together in the same target.

For example, let's say you have a bunch of ECS services. Each service has a bunch of common resources, such as aws_iam_role, aws_task_definition, aws_ecs_service, etc. But one particular service also needs an additional resource, for example an aws_s3_bucket.

I tried doing the following code example with terragrunt v0.28.7, but as soon as I add the file targets/prod/my-sevice/main.tf (even an empty one), it seems to take over the whole target, and the module import code in terragrunt.hcl is silently ignored.

.
├── modules
│   └── ecs-service
│       └── main.tf
└── targets
    └── prod
        ├── my-service
        │   ├── main.tf
        │   └── terragrunt.hcl
        └── terragrunt.hcl

targets/prod/my-sevice/main.tf:

resource "aws_s3_bucket" "b" {...}

targets/prod/my-service/terragrunt.hcl:

include {
  path = find_in_parent_folders()
}

terraform {
  source = "../../../modules//ecs-service"
}

inputs = {
  account-name = "prod"
  service-name = "my-service"
}

@vspinu
Copy link

vspinu commented Sep 12, 2022

If I understand correctly the proposed approach it is anchored on the common but simple env/stack file organization. With more complex hierarchies it would still imply creating multiple flat files with unavoidable replication.

For instance it's not immediately clear to me how the flat file approach would apply to the tire/stack/env file organization which I personally find most meaningful.

.
├── foundation
│   ├── account-setup
│   │   ├── dev
│   │   ├── main
│   │   ├── prod
│   │   ├── security
│   │   └── users
│   ├── k8s
│   │   ├── dev
│   │   └── prod
│   ├── network
│   │   ├── dev
│   │   ├── main
│   │   ├── prod
│   │   ├── security
│   │   └── users
│   └── users
│       ├── main
│       └── users
├── service
│   ├── app1
│   └── app2
└── shared
    ├── logs
    └── security

(inspired by terraform skelton blog post)

Being application-focused (tire/stack) instead of env-focused, this organization has several advantages:

  • configuration options follow the hierarchy from most general (tier level) down to the most specific (application) and account.
  • extraction is easy - simply move the sub-directory to a separate repo
  • tire/stack path is not limited to 2 levels and can follow system and business requirements as it no longer needs to be replicated for each environment.

In light of the first point it seems to me that the "import" proposal would overlay better with this organization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests