-
Notifications
You must be signed in to change notification settings - Fork 9.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
terraform_synthetic_state resource #3164
Conversation
@phinze what do you think of this concept? I like it because I'm currently maintaining environments at a lower level of abstraction to how my downstream consumers are interacting with them, and presently we're jumping through hoops to run a downstream Terraform config that does nothing except instantiate a I would concede that it does diverge somewhat from the original idea of remote state, but I think of it as an optional extra mechanism to give some flexibility to those maintaining more complicated structures in Terraform, which can be ignored for simple cases. |
Looks interesting |
Hey @apparentlymart - just had a chance to review this. This is super interesting - it turns the remote state from being a "pull" of all outputs for a given live config to a "push" / "publish" model. Before we consider pulling this implementation in as-is I'd like to take a step back and spec out the abstract need this fulfills to help us decide if a |
@phinze cool, great! At work we actually started using this "for real" in our infrastructure a few weeks ago and it simplified things a lot for us. Perhaps more details on our specific use-case will help to motivate the design here. Due to #1819, along with easier partial application, we build our deployment environments (e.g. QA vs. Production) out of multiple configurations -- one per AWS region -- and use remote state as the means to connect these components together to form layers, somewhat like what's shown in this diagram: The first two ranks of this graph are "real" Terraform configs that actually create things, with "global" establishing some shared stuff and then the per-region-per-environment configs creating the necessary infrastructure in each environment. However, we actually use AWS availability zone as our primary subdivision from the perspective of app deployment, with each app configured to set itself up in two or more AZs. We then map the Consul concept of "datacenter" onto AZs. In order to simplify the application-level Terraform configurations, we publish the third rank of configurations which we've been calling the "logical" configurations, and these adapt the region-oriented configurations onto AZ-oriented configurations, doing something like this: // Implements an AZ-level remote state for us-west-1a, derived from the us-west-1 PROD config
resource "terraform_remote_state" "region" {
// (reference to the PROD us-west-1 remote state, for example)
}
output "aws_region" {
value = "us-west-1"
}
output "aws_az" {
value = "us-west-1a"
}
output "aws_vpc_id" {
// There is one VPC shared between all of the subnets in a region, so
// the us-west-1b config would have exactly the same value here.
value = "${terraform_remote_state.region.aws_vpc_id}"
}
output "aws_subnet_id" {
// The region-level config includes subnets for both the 'a' and 'b' AZs,
// so here we pick out the one that's relevant to AZ 'a'.
value = "${terraform_remote_state.region.aws_az_a_subnet_id}"
}
output "consul_server_addrs" {
// The region-level config includes consul servers for both the 'a' and 'b' AZs,
// so here we pick out the ones that are relevant to AZ 'a'.
value = "${terraform_remote_state.region.aws_az_a_consul_server_addrs}"
} These logical configs are literally just a single Within the app configs, we then have Terraform read a set of "datacenter names" (AZ names) from the app's Consul configuration keys, and use
This extra layer of abstraction is important to us because what is a set of separate regions in our production environment is simplified as a bunch of AZs within the same region in our development environments, but yet we are able to use exactly the same Terraform config to deploy the app infrastructure in all environments. When we started using
With this we've eliminated the steps of manually "applying" the logical configs to get their remote state as a side-effect, and thus our environment management is simpler and less mistake-prone. The aspect of this design that I enjoyed is that the downstream configs are agnostic as to whether they are getting a "real" state or a synthetic one, and so I was able to make the above change in our environment-level configs without changing anything anywhere else. It also builds on the existing remote state adapter infrastructure rather than having a parallel set of resources (of which (Some of the details in the above have been altered from our real config to reduce irrelevant distractions, but the relevant concepts are identical to those our current running configuration.) |
Thanks for the thorough description! It's really useful to understand the details of a real world use case. Coming back around to this, I'm struck by the fact that we're piggybacking onto Terraform's notion of "state" what's at base simply a K/V publish/consume relationship between upstream and downstream configs. This thought first hit me as "hang on a second, he could do all of this with # Upstream config publishes
resource "consul_keys" "az_a" {
datacenter = "us-west-1"
key {
path = "datacenters/us-west-1a/terraform_state"
name = "consul_server_addrs"
value = "${module.consul_a.server_addrs}"
}
# ... etc
} and # Downstream config consumes
resource "consul_keys" "az_a" {
datacenter = "us-west-1"
key {
path = "datacenters/us-west-1a/terraform_state"
name = "consul_server_addrs"
}
# ... etc
} Like I say, there is some overdue love needed for the consul provider that I believe would be a prerequisite to using this in anger, but I'm curious to get your thoughts on this concept! |
@phinze we also use Consul keys for sharing data in some parts of our solution, and the thought had previously occured to me that Like I was saying above, piggy-backing on remote state has the advantage of supporting all of the same transports that the remote state mechanism already supports, so it's trivial to switch back and forth between using "real" states and synthetic states; thus I expect synth. states could be used to preserve compatibility as infrastructures change, along with the use-case I described above. If using I've actually had another idea developing in my head for a while that I think speaks to this, continuing my ongoing efforts to tweak Terraform's design around sharing data between related configs; I'm gonna write that up now and then link to it from here since I think it will clarify what I'm talking about. Update: #4169 is my proposal for making Terraform support reading data as a separate concept from creating and managing objects, to improve the UX around data-driven configuration. If we implemented that then I'd implement write/read pairs like |
I'm incredibly excited about #4169! Hypothetically - in that model - we could have data "terraform_remote_state" {} For reading and shift resource "terraform_remote_state" {} to be used for publishing. Of course there are backcompat concerns to work out there, but I'm just playing around with the new ideas. Probably worth playing out the conversation over in #4169 first and pausing this in the meantime. |
Totally agreed on letting this one sit until we talk out #4169. It's pretty likely that with robust support for various data sources, along with fixing the quirks in the |
The normal way to share a state for consumption by downstream terraform_remote_state is to just point consumers at the same configuration used to persist the configuration state. This is fine most of the time, but sometimes it's desirable to present downstream consumers with a different level of abstraction than the physical configuration represents. For example, one might create AWS infrastructure on a per-region basis but expose it to downstream configurations on a per-availability-zone basis. This resource thus allows a Terraform configuration to produce additional "synthetic" remote states, containing only outputs, that are intended only for downstream consumption and are not used to actually manage any resources.
@phinze correctly suggested earlier in this discussion that using At the time I wrote this, and the time of that discussion, the Consul provider had some bugs and limitations that made this not work so smoothly in practice. #5210 has fixed part of this, and #5988 adds a new resource that addresses the remaining limitations. Since nobody else ever chimed in to say they would use this, I'm going to close it in favor of #5210 and #5988, and eventually transition our existing uses of |
I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues. If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further. |
The normal way to share a state for consumption by downstream
terraform_remote_state
is to just point consumers at the same configuration used to persist the configuration state itself.This is fine most of the time, but sometimes it's desirable to present downstream consumers with a different level of abstraction than the physical configuration represents. For example, one might create AWS infrastructure on a per-region basis but expose it to downstream configurations on a per-availability-zone basis.
This resource thus allows a Terraform configuration to produce additional "synthetic" remote states, containing only outputs, that are intended only for downstream consumption and are not used to actually manage any resources. They contain only outputs, and no resource states.
This is a real use-case I have: we maintain environments as a collection of "physical" configurations, which create resources on a per-AWS-region basis, and then we currently maintain a collection of "logical" configurations that do nothing except read the physical remote states and project their data into an availability-zone-oriented basis that our downstream app infrastructure deployments expect.
I'd like to simplify by eliminating that extra level of logical configuration, and just have the physical configurations write out a set of synthetic states for our downstream configs to consume:
While I'm sure we could share such configuration data a different way if forced to, I think it's convenient for downstream consumers to not have to think about two different mechanisms for consuming upstream data depending on whether it's a "real" Terraform state or just some derived outputs.
So far I just wrote the main implementation to get some feedback. If this seems like something that would be accepted into Terraform then I'll follow up with some tests and documentation.