terraform_synthetic_state resource #3164

apparentlymart · 2015-09-03T16:55:53Z

The normal way to share a state for consumption by downstream terraform_remote_state is to just point consumers at the same configuration used to persist the configuration state itself.

This is fine most of the time, but sometimes it's desirable to present downstream consumers with a different level of abstraction than the physical configuration represents. For example, one might create AWS infrastructure on a per-region basis but expose it to downstream configurations on a per-availability-zone basis.

This resource thus allows a Terraform configuration to produce additional "synthetic" remote states, containing only outputs, that are intended only for downstream consumption and are not used to actually manage any resources. They contain only outputs, and no resource states.

This is a real use-case I have: we maintain environments as a collection of "physical" configurations, which create resources on a per-AWS-region basis, and then we currently maintain a collection of "logical" configurations that do nothing except read the physical remote states and project their data into an availability-zone-oriented basis that our downstream app infrastructure deployments expect.

I'd like to simplify by eliminating that extra level of logical configuration, and just have the physical configurations write out a set of synthetic states for our downstream configs to consume:

resource "terraform_synthetic_state" "usw1a" {
    backend = "consul"
    config = { ... }
    outputs = {
        availability_zone = "us-west-1a"
        region = "us-west-1"
        vpc_id = "${aws_vpc.main.id}"
        subnet_id = "${aws_subnet.usw1a.id}"
        // ...
    }
}

resource "terraform_synthetic_state" "usw1b" {
    backend = "consul"
    config = { ... }
    outputs = {
        availability_zone = "us-west-1b"
        region = "us-west-1"
        vpc_id = "${aws_vpc.main.id}"
        subnet_id = "${aws_subnet.usw1b.id}"
        // ...
    }
}

While I'm sure we could share such configuration data a different way if forced to, I think it's convenient for downstream consumers to not have to think about two different mechanisms for consuming upstream data depending on whether it's a "real" Terraform state or just some derived outputs.

So far I just wrote the main implementation to get some feedback. If this seems like something that would be accepted into Terraform then I'll follow up with some tests and documentation.

apparentlymart · 2015-09-09T20:07:37Z

@phinze what do you think of this concept?

I like it because I'm currently maintaining environments at a lower level of abstraction to how my downstream consumers are interacting with them, and presently we're jumping through hoops to run a downstream Terraform config that does nothing except instantiate a terraform_remote_state and re-publish a subset of its outputs with different names.

I would concede that it does diverge somewhat from the original idea of remote state, but I think of it as an optional extra mechanism to give some flexibility to those maintaining more complicated structures in Terraform, which can be ignored for simple cases.

sl1pm4t · 2015-11-03T00:25:15Z

Looks interesting

phinze · 2015-11-30T23:52:04Z

Hey @apparentlymart - just had a chance to review this.

This is super interesting - it turns the remote state from being a "pull" of all outputs for a given live config to a "push" / "publish" model.

Before we consider pulling this implementation in as-is I'd like to take a step back and spec out the abstract need this fulfills to help us decide if a terraform_synthetic_state resource is the proper solution to pull in. Let me do some thinking on this and I'll get back to you.

apparentlymart · 2015-12-01T17:59:42Z

@phinze cool, great!

At work we actually started using this "for real" in our infrastructure a few weeks ago and it simplified things a lot for us. Perhaps more details on our specific use-case will help to motivate the design here.

Due to #1819, along with easier partial application, we build our deployment environments (e.g. QA vs. Production) out of multiple configurations -- one per AWS region -- and use remote state as the means to connect these components together to form layers, somewhat like what's shown in this diagram:

The first two ranks of this graph are "real" Terraform configs that actually create things, with "global" establishing some shared stuff and then the per-region-per-environment configs creating the necessary infrastructure in each environment.

However, we actually use AWS availability zone as our primary subdivision from the perspective of app deployment, with each app configured to set itself up in two or more AZs. We then map the Consul concept of "datacenter" onto AZs.

In order to simplify the application-level Terraform configurations, we publish the third rank of configurations which we've been calling the "logical" configurations, and these adapt the region-oriented configurations onto AZ-oriented configurations, doing something like this:

// Implements an AZ-level remote state for us-west-1a, derived from the us-west-1 PROD config

resource "terraform_remote_state" "region" {
    // (reference to the PROD us-west-1 remote state, for example)
}

output "aws_region" {
    value = "us-west-1"
}
output "aws_az" {
    value = "us-west-1a"
}
output "aws_vpc_id" {
    // There is one VPC shared between all of the subnets in a region, so
    // the us-west-1b config would have exactly the same value here.
    value = "${terraform_remote_state.region.aws_vpc_id}"
}
output "aws_subnet_id" {
    // The region-level config includes subnets for both the 'a' and 'b' AZs,
    // so here we pick out the one that's relevant to AZ 'a'.
    value = "${terraform_remote_state.region.aws_az_a_subnet_id}"
}
output "consul_server_addrs" {
    // The region-level config includes consul servers for both the 'a' and 'b' AZs,
    // so here we pick out the ones that are relevant to AZ 'a'.
    value = "${terraform_remote_state.region.aws_az_a_consul_server_addrs}"
}

These logical configs are literally just a single terraform_remote_state resource and a bunch of outputs, and we terraform apply them only for the side-effect of creating a state file that is easier to consume for the apps.

Within the app configs, we then have Terraform read a set of "datacenter names" (AZ names) from the app's Consul configuration keys, and use terraform_remote_state to dynamically load the right configs, without the application's Terraform config needing to understand the datacenter-to-region relationships:

resource "terraform_remote_state" "primary_datacenter" {
    backend = "consul"
    config = {
        address = "${var.environment_hostname}:80"
        path = "datacenters/${consul_keys.deploy_config.primary_datacenter}/terraform_state"
    }
}

This extra layer of abstraction is important to us because what is a set of separate regions in our production environment is simplified as a bunch of AZs within the same region in our development environments, but yet we are able to use exactly the same Terraform config to deploy the app infrastructure in all environments.

When we started using terraform_synthetic_state in some of our environments, we were able to eliminate the "logical" AZ-level configurations entirely, and instead just treat them as another resource within the region-level configs:

// This time we're included *within* the us-west-1 region-level config, so we can just reference
// the physical resources directly, and eliminate the region-level state as an interface.

resource "terraform_synthetic_state" "az_a" {
    backend = "consul"
    config = {
        address = "${var.environment_hostname}:80"
        path = "datacenters/us-west-1a/terraform_state"
    }
    outputs = {
        aws_region = "us-west-1"
        aws_az = "us-west-1a"
        aws_vpc_id = "${aws_vpc.main.id}"
        aws_subnet_id = "${aws_subnet.az_a.id}"
        consul_server_addrs = "${module.consul_a.server_addrs}"
    }
}
resource "terraform_synthetic_state" "az_b" {
    backend = "consul"
    config = {
        address = "${var.environment_hostname}:80"
        path = "datacenters/us-west-1b/terraform_state"
    }
    outputs = {
        aws_region = "us-west-1"
        aws_az = "us-west-1b"
        aws_vpc_id = "${aws_vpc.main.id}"
        aws_subnet_id = "${aws_subnet.az_b.id}"
        consul_server_addrs = "${module.consul_b.server_addrs}"
    }
}

With this we've eliminated the steps of manually "applying" the logical configs to get their remote state as a side-effect, and thus our environment management is simpler and less mistake-prone.

The aspect of this design that I enjoyed is that the downstream configs are agnostic as to whether they are getting a "real" state or a synthetic one, and so I was able to make the above change in our environment-level configs without changing anything anywhere else. It also builds on the existing remote state adapter infrastructure rather than having a parallel set of resources (of which consul_keys is an already-existing example that we could've applied here) to read and write data from various different storage locations.

(Some of the details in the above have been altered from our real config to reduce irrelevant distractions, but the relevant concepts are identical to those our current running configuration.)

phinze · 2015-12-04T12:58:20Z

Thanks for the thorough description! It's really useful to understand the details of a real world use case.

Coming back around to this, I'm struck by the fact that we're piggybacking onto Terraform's notion of "state" what's at base simply a K/V publish/consume relationship between upstream and downstream configs.

This thought first hit me as "hang on a second, he could do all of this with consul_keys!" (bracketing the fact that there are bugs we need to fix in provider/consul).

# Upstream config publishes
resource "consul_keys" "az_a" {
  datacenter = "us-west-1"
  key {
    path  = "datacenters/us-west-1a/terraform_state"
    name  = "consul_server_addrs"
    value = "${module.consul_a.server_addrs}"
  }
  # ... etc
}

and

# Downstream config consumes
resource "consul_keys" "az_a" {
  datacenter = "us-west-1"
  key {
    path  = "datacenters/us-west-1a/terraform_state"
    name  = "consul_server_addrs"
  }
  # ... etc
}

Like I say, there is some overdue love needed for the consul provider that I believe would be a prerequisite to using this in anger, but I'm curious to get your thoughts on this concept!

apparentlymart · 2015-12-04T17:03:43Z

@phinze we also use Consul keys for sharing data in some parts of our solution, and the thought had previously occured to me that terraform_remote_state is really just "fetch me some key/value pairs" but with the strange requirement that they be wrapped up in a skeleton of other, ignored JSON.

Like I was saying above, piggy-backing on remote state has the advantage of supporting all of the same transports that the remote state mechanism already supports, so it's trivial to switch back and forth between using "real" states and synthetic states; thus I expect synth. states could be used to preserve compatibility as infrastructures change, along with the use-case I described above.

If using consul_keys is the recommended path then I'd want to generalize this path so that e.g. there is also a resource for reading S3 bucket objects, and presumably the other remote state backends too although I personally don't use any of the others so I can't speak to that. Some infrastructures exist "before" Consul does, so we've tended to use S3 as a convenient place to pass config between those "early stage" infrastructures, until "PROD global" and "QA global" get the Consul cluster set up enough to be usable.

I've actually had another idea developing in my head for a while that I think speaks to this, continuing my ongoing efforts to tweak Terraform's design around sharing data between related configs; I'm gonna write that up now and then link to it from here since I think it will clarify what I'm talking about.

Update: #4169 is my proposal for making Terraform support reading data as a separate concept from creating and managing objects, to improve the UX around data-driven configuration. If we implemented that then I'd implement write/read pairs like resource "aws_s3_bucket_object" and data "aws_s3_bucket_object", and happily use these instead of terraform_synthetic_state/terraform_remote_state as a more general data-sharing mechanism.

phinze · 2015-12-04T23:31:25Z

I'm incredibly excited about #4169!

Hypothetically - in that model - we could have

data "terraform_remote_state" {}

For reading and shift

resource "terraform_remote_state" {}

to be used for publishing. Of course there are backcompat concerns to work out there, but I'm just playing around with the new ideas. Probably worth playing out the conversation over in #4169 first and pausing this in the meantime.

apparentlymart · 2015-12-04T23:50:35Z

Totally agreed on letting this one sit until we talk out #4169. It's pretty likely that with robust support for various data sources, along with fixing the quirks in the consul_keys implementation, we'd move away from using terraform_remote_state for many of our use-cases.

The normal way to share a state for consumption by downstream terraform_remote_state is to just point consumers at the same configuration used to persist the configuration state. This is fine most of the time, but sometimes it's desirable to present downstream consumers with a different level of abstraction than the physical configuration represents. For example, one might create AWS infrastructure on a per-region basis but expose it to downstream configurations on a per-availability-zone basis. This resource thus allows a Terraform configuration to produce additional "synthetic" remote states, containing only outputs, that are intended only for downstream consumption and are not used to actually manage any resources.

apparentlymart · 2016-04-02T01:46:27Z

@phinze correctly suggested earlier in this discussion that using consul_keys directly would obviate much of the need for this, at least so long as Consul is the backend being used for the "synthetic state".

At the time I wrote this, and the time of that discussion, the Consul provider had some bugs and limitations that made this not work so smoothly in practice. #5210 has fixed part of this, and #5988 adds a new resource that addresses the remaining limitations.

Since nobody else ever chimed in to say they would use this, I'm going to close it in favor of #5210 and #5988, and eventually transition our existing uses of terraform_synthetic_state (already in production use via our alternative Terraform distro) over to using consul_key_prefix to write and consul_keys to read.

ghost · 2020-04-26T02:37:41Z

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

radeksimko added enhancement proposal labels Oct 3, 2015

apparentlymart mentioned this pull request Oct 11, 2015

Add Terraform/Remote State documentation to provider/resource section. #2077

Closed

apparentlymart mentioned this pull request Nov 10, 2015

Any benefit of using separate remote state files? #3838

Closed

phinze added the thinking label Nov 30, 2015

apparentlymart mentioned this pull request Dec 4, 2015

Data-driven Terraform Configuration #4169

Closed

apparentlymart mentioned this pull request Apr 2, 2016

provider/consul: consul_key_prefix resource #5988

Merged

3 tasks

apparentlymart closed this Apr 2, 2016

apparentlymart mentioned this pull request Oct 24, 2016

Proposal: State Encryption #9556

Closed

apparentlymart mentioned this pull request Aug 13, 2018

Store outputs separately from rest of tfstate #18603

Closed

ghost locked and limited conversation to collaborators Apr 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

terraform_synthetic_state resource #3164

terraform_synthetic_state resource #3164

apparentlymart commented Sep 3, 2015

apparentlymart commented Sep 9, 2015

sl1pm4t commented Nov 3, 2015

phinze commented Nov 30, 2015

apparentlymart commented Dec 1, 2015

phinze commented Dec 4, 2015

apparentlymart commented Dec 4, 2015

phinze commented Dec 4, 2015

apparentlymart commented Dec 4, 2015

apparentlymart commented Apr 2, 2016

ghost commented Apr 26, 2020

terraform_synthetic_state resource #3164

terraform_synthetic_state resource #3164

Conversation

apparentlymart commented Sep 3, 2015

apparentlymart commented Sep 9, 2015

sl1pm4t commented Nov 3, 2015

phinze commented Nov 30, 2015

apparentlymart commented Dec 1, 2015

phinze commented Dec 4, 2015

apparentlymart commented Dec 4, 2015

phinze commented Dec 4, 2015

apparentlymart commented Dec 4, 2015

apparentlymart commented Apr 2, 2016

ghost commented Apr 26, 2020