Terraform fails to notice when a Nomad job has changed #1

hashibot · 2017-06-13T20:29:48Z

This issue was originally opened by @blalor as hashicorp/terraform#14038. It was migrated here as part of the provider split. The original body of the issue is below.

Terraform Version

Terraform v0.9.3

Affected Resource(s)

nomad_job

Terraform Configuration Files

provider "nomad" {
    address = "http://localhost:4646"
}

resource "nomad_job" "example" {
    jobspec = "${file("${path.module}/example.nomad")}"
}

Debug Output

Log for 2nd terraform apply: apply.log.txt

Console output:

$ TF_LOG=TRACE TF_LOG_PATH=apply.log.txt terraform apply
nomad_job.example: Refreshing state... (ID: example)

Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Expected Behavior

Terraform should have noticed that the job had changed in the cluster and updated it.

Actual Behavior

Nuttin', honey.

Steps to Reproduce

nomad agent -dev
terraform apply
nomad status example shows one allocation for task group grp

change task group count:

 curl -sfS localhost:4646/v1/job/example | \
     jq '.TaskGroups[0].Count = 2 | {"Job": .}' | \
     curl -is -X PUT -d@- localhost:4646/v1/job/example

nomad status example shows two allocations for example.grp
terraform plan shows no change
terraform apply makes no change
nomad status example still shows two allocations

The text was updated successfully, but these errors were encountered:

ketzacoatl · 2018-02-20T19:26:58Z

@paddycarver / @grubernaut / @radeksimko, is this a verified issue/bug? I'm seeing a slightly different issue, but not sure if it's the same as this. If a job has been stopped, or is not running (but the spec has not changed), then the TF nomad provider should update nomad. It might be best to always have the job sent to nomad, and let nomad work out issues with differences in the spec? This issue makes the provider unusable for the most basic use cases.

ketzacoatl · 2018-02-22T09:51:41Z

The underlying issue here also produces the following behavior:

run a job with the terraform provider
nomad stop that job manually
re-apply the terraform plan, the job is still stopped

paddycarver · 2018-02-22T21:49:06Z

Hi @ketzacoatl! I can't say for sure whether this is a verified bug, nor can I explain the behaviour. I'll try to look into this soon and come back with a bit more information, and a fix if necessary. Apologies for the delays on this.

ketzacoatl · 2018-03-23T16:50:28Z

hi @paddycarver, thanks for taking a look! were you able to confirm the behavior we see in practice?

shantanugadgil · 2018-03-31T04:06:21Z

My observation is the same as @ketzacoatl.

It would be just more operator friendly if users could interact via Terraform, rather than nomad status

My versions are
Terraform 0.11.5
Nomad (built from master) (0.8.0-dev)

ketzacoatl · 2018-04-26T10:49:28Z

Any updates on this @paddycarver?

Also, a question: Does the Terraform provider always submit the job to nomad, or does it decide whether or not it should?

kerscher · 2018-05-03T12:47:01Z

@paddycarver I took a quick look inside the codebase, and it seems the culprit is:

resourceJobRead is not implemented
there is no metadata to discern changes other than job ID within the schema
de-then-registering only occurs when ID itself changes

#15 by @apparentlymart adds more metadata, but modify_index wouldn’t pick up changes from the provider. I think “Solution 1” below is a nice way of dealing with it.

Solution 1: add `status` computed metadata to the schema, plus #15 other items

Upon read, check with job.Stopped() whether it should be re-registered
Notify through metadata if stopped that recreation is necessary

Pros

Smarter information for plan
Will always restart a stopped job according to configuration

Cons

plan now has to read more information before, possibly slowing it down
More changes to the codebase

Solution 2: Always de-register and re-register

Refactor resourceJobRegister so that it always de-then-re-register, regardless of configuration change or not.
Always mark resource job specs as tainted

Pros

Easy to add
Never leads to stale jobs not being picked up
Retains insanely fast plan with Nomad jobs

Cons

Users probably do not expect that to happen if they did not change their configurations
Ugly plans with unnecessary information

ketzacoatl · 2018-06-28T16:58:37Z

@paddycarver, I'd love to help resolve this problem, as IMO it prevents serious use of Terraform to manage nomad, do you have any guidance or recommendations on how you would like to address the problem?

cc @katbyte / @radeksimko

ketzacoatl · 2018-07-20T23:22:11Z

@mitchellh I'd be very grateful for your feedback/guidance here.

ketzacoatl · 2018-12-06T06:20:30Z

@cgbaker, With renewed development efforts here, what are your thoughts on this issue?

cgbaker · 2018-12-06T17:17:27Z

Hi @ketzacoatl, I wholeheartedly agree that this is something that must be addressed. This shortcoming makes the Nomad provider just about useless for Day 2 operations.

As noted in #15 , there are a few different options for addressing this. The solution partially hinges on whether we want to try to find a general solution that doesn't require modifying and re-releasing the provider every time Nomad releases a new version. On the other hand, with the Nomad product team taking ownership of this TF provider, we can potentially address such a workflow a little better than before. And it may be prudent to find a temporary solution to this issue while we work on a generic/unversioned provider.

The Nomad team is committed to addressing this; I will post an update here soon giving an idea as to the timetable, but my intention is to either resolve this issue as part of the upcoming 1.3.0 version of the provider (targeting Nomad 0.8.x) or the 1.4.0 version (targeting Nomad 0.9.x).

Thank you for your patience and persistence on this issue.

ketzacoatl · 2018-12-11T21:18:07Z

And it may be prudent to find a temporary solution to this issue while we work on a generic/unversioned provider.

That would be great, WRT being able to continue using the provider while future improvements get worked out.

cgbaker · 2018-12-20T21:33:21Z

Nomad 0.8.x API support is available in the Nomad v1.3.0 that released today. I will look at finding a longer-term solution for this in upcoming versions; having said that, even if we continue version the Nomad providers, we pledge to be much more responsive in updating the Nomad provider going forward.

ketzacoatl · 2019-04-04T05:02:07Z

The Nomad team is committed to addressing this; I will post an update here soon giving an idea as to the timetable, but my intention is to either resolve this issue as part of the upcoming 1.3.0 version of the provider (targeting Nomad 0.8.x) or the 1.4.0 version (targeting Nomad 0.9.x).

Hello, just checking back up on this - have there been any improvements related to this?

cgbaker · 2019-04-04T17:09:51Z

No update as of now, we've been focusing on core Nomad work, finishing the 0.9.0 release.

…

On Thu, Apr 4, 2019 at 1:02 AM ketzacoatl ***@***.***> wrote: The Nomad team is committed to addressing this; I will post an update here soon giving an idea as to the timetable, but my intention is to either resolve this issue as part of the upcoming 1.3.0 version of the provider (targeting Nomad 0.8.x) or the 1.4.0 version (targeting Nomad 0.9.x). Hello, just checking back up on this - have there been any improvements related to this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABmPT9wkf1S_J33gI6wrFz0StYLsaQTuks5vdYdQgaJpZM4N5Ctd> .

ketzacoatl · 2019-04-11T03:05:33Z

Fair enough, thanks for the update!

cgbaker · 2019-06-05T22:54:25Z

Update: this was not resolved in the 1.4.0 release of the provider, but we're still tracking this.

ketzacoatl · 2019-09-13T20:15:19Z

I am stoked to see this on the 1.5 milestone, rock on!

liemle3893 · 2020-07-14T09:47:13Z

It have been a year since the last comment. Any update on this?

ketzacoatl · 2020-07-14T14:20:49Z

@cgbaker WRT the roadmap, are there refactors or internal changes that block fixing this properly?

cgbaker · 2020-07-14T19:05:40Z

we were waiting for the update 2.0 plugin SDK to see if it helped deal with this. we're actively looking at it now; we want this issue dealt with in the next few months as part of the Nomad 1.0 milestone.

Resolved hashicorp#1. Terraform fails to notice when a Nomad job has changed To have this, you need to maximize the use of Job Meta. (eg: Docker Image) This patch only notice when number of running instances of group in job has changed. Tested with Nomad 0.11.3 Tested with `Service` job. Batch is assumed to be save to run multiple times so there would be no problem.

eliburke · 2022-07-08T16:39:35Z

Is there any plan to release v1.5 soon? Is there more work to finish out this feature? #149 hasn't been touched in almost 2 years.

lgfa29 · 2022-07-15T21:57:24Z

Hi @eliburke 👋

No plans to get this fixed unfortunately. The work on #149 was a brave attempt to try and map all possible jobspec fields into a Terraform resource schema, but that was not a sustainable approach. As Nomad evolves it becomes almost impossible to manually keep up.

We think that a better approach would be to leverage the Nomad OpenAPI project that is able to auto-generate a Nomad job spec, and the Terraform Provider Framework which is a new and more flexible way to create providers.

But this will take a significant effort that is not in our roadmap yet.

shantanugadgil · 2022-09-22T17:42:45Z

suggestion: to have a quick workaround for this problem, what we have been doing for quite some time is the we render out the entire Nomad job file and use it via the terrafrm-nomad provider (in comple rendered form)

What this does is, when you suspect things, and want to check what is up with the job, you can try out nomad job plan myjob.nomad instead of the usual terraform plan.

In my opinion this provides an easy escape hatch mechanism.

tristanmorgan · 2023-07-31T03:41:21Z

Can the Submission field in the API now be used?
enable support for setting original job source

lgfa29 · 2023-07-31T16:27:28Z

Hi @tristanmorgan,

Unfortunately that's not enough. The key issue here is describing the Nomad job specification as a Terraform resource schema and detecting changes on individual fields. The resource already has the raw jobspec stored, which would be the same as the value returned by the new job submission endpoint.

jorgemarey · 2023-11-30T15:19:03Z

Just made a comment on #238 (comment) about this.

If available, use the job submission source to detect changes to `jobspec`. This can mitigate drift detection problems such as #1.

lgfa29 · 2023-12-19T02:13:25Z

Reading @jorgemarey comment again, I noticed this part:

We detect changes by having a shasum of the rendered job and seeing if it changes with the value stored for the previous resource version.

While not quite related to this issue, it made me realize that we can use the job submission data (if available) to detect changes to jobspec, which should help mitigate drift problems like this when the submission data is available.

The default behaviour of the Terraform SDK is to copy the plan result into state which could result in partial state updates, where Terraform state is updated, but the actual resource state is not, in case of an error during the apply. This is normally not an issue because resources are expected to undo these changes on state refresh. Any partial update is reconciled with the actual resource state. But due to #1, the `nomad_job` resource is not able to properly reconcile on refresh, causing the partial update to prevent further applies unless the configuration is also changed. This commit uses the `d.Partial()` method to signal to Terraform that any state changes should be rolledback in case of an error.

If available, use the job submission source to detect changes to `jobspec`. This can mitigate drift detection problems such as #1.

If available, use the job submission source to detect changes to `jobspec`. This can mitigate drift detection problems such as #1. Read HCL2 variables from job submission even if the `nomad_job` resource does not speify an `hcl2` block.

ag-TJNII · 2024-02-26T20:56:44Z

Is this improved at all with v2 of the provider? I need to take over control of previously hand-generated Nomad job templates with Terraform. Usually when I do this workflow I create a resource in the Terraform code, import the existing resource, and reconcile the differences. However, the fact that this provider doesn't detect differences completely breaks that workflow.

I briefly tried upgrading to v2 provider but still got a unexpectedly large diff, how much of that was this bug and how much was v1 to v2 incompatibilities I didn't determine. I dropped back to v1 to not burn time upgrading to a new version that might have the same major bug.

tgross · 2024-07-12T19:27:37Z

Hey folks! @gulducat and I did a quick re-assessment of this bug and here's what the current situation is:

Because of the new job submission data, terraform plan will successfully detect changes that are made via nomad job run and the Nomad UI.
This will not detect any changes made via nomad job scale or via the Nomad API directly.
This will not detect that a job has been stopped via nomad job stop.
We think we can reduce the scope of this bug by adding state and count fields to the resources object, and that isn't a huge lift.
Resolving the problem completely will unfortunately still require having a complete model of the Job object in the Terraform state. There isn't really a good way around this yet and it's going to be a big lift to fix.

After some internal discussion I'm marking this for further roadmapping. We'll update again once we know more.

ketzacoatl · 2024-07-16T17:23:46Z

@tgross awesome update, thank you for all that info. Even with a big lift in the future, smaller/incremental improvements are very welcome!

hashibot added the bug label Jun 13, 2017

paddycarver self-assigned this Feb 22, 2018

kerscher mentioned this issue May 3, 2018

[WIP] nomad_job: richer diff, and catch errors during plan #15

Closed

ketzacoatl mentioned this issue Jun 25, 2019

0.12 support #60

Closed

cgbaker added this to the 1.5.0 milestone Aug 26, 2019

lgfa29 mentioned this issue Sep 5, 2019

Terraform does not create Sentinel policy if not already created #38

Closed

paddycarver removed their assignment Oct 7, 2019

liemle3893 mentioned this issue Jul 15, 2020

Resolved #1. Notice when a Nomad job has changed. #132

Closed

lgfa29 added stage/accepted theme/resource/job type/bug labels Jan 15, 2021

lgfa29 mentioned this issue Nov 2, 2021

nomad_job resource information when planning #247

Closed

lgfa29 mentioned this issue Aug 31, 2022

file resource for nomad volume #283

Closed

lgfa29 mentioned this issue Jul 27, 2023

nomad_job has missing namespace arguments #361

Open

lgfa29 mentioned this issue Jul 31, 2023

Cannot update from system job to service #365

Closed

rodrigol-chan mentioned this issue Oct 13, 2023

job: ensure that job is running #389

Merged

This was referenced Nov 25, 2023

Add nomad_job_v2 resource using terraform-plugin-go #238

Closed

Add a nomad_job_v2 resource #149

Closed

lgfa29 added a commit that referenced this issue Dec 19, 2023

job: read job submission source

5ad8e6b

If available, use the job submission source to detect changes to `jobspec`. This can mitigate drift detection problems such as #1.

lgfa29 mentioned this issue Dec 19, 2023

job: prevent partial update on error #412

Merged

gulducat mentioned this issue Dec 19, 2023

job: add job submission on register #405

Merged

lgfa29 added a commit that referenced this issue Dec 19, 2023

job: read job submission source

a7239e8

If available, use the job submission source to detect changes to `jobspec`. This can mitigate drift detection problems such as #1.

tgross added the hcc/jira Admin - internal label Jul 12, 2024

tgross removed this from the 1.5.0 milestone Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terraform fails to notice when a Nomad job has changed #1

Terraform fails to notice when a Nomad job has changed #1

hashibot commented Jun 13, 2017

ketzacoatl commented Feb 20, 2018 •

edited

Loading

ketzacoatl commented Feb 22, 2018

paddycarver commented Feb 22, 2018

ketzacoatl commented Mar 23, 2018

shantanugadgil commented Mar 31, 2018

ketzacoatl commented Apr 26, 2018 •

edited

Loading

kerscher commented May 3, 2018

ketzacoatl commented Jun 28, 2018 •

edited

Loading

ketzacoatl commented Jul 20, 2018

ketzacoatl commented Dec 6, 2018

cgbaker commented Dec 6, 2018

ketzacoatl commented Dec 11, 2018 •

edited

Loading

cgbaker commented Dec 20, 2018

ketzacoatl commented Apr 4, 2019

cgbaker commented Apr 4, 2019 via email

ketzacoatl commented Apr 11, 2019

cgbaker commented Jun 5, 2019

ketzacoatl commented Sep 13, 2019

liemle3893 commented Jul 14, 2020

ketzacoatl commented Jul 14, 2020

cgbaker commented Jul 14, 2020

eliburke commented Jul 8, 2022

lgfa29 commented Jul 15, 2022

shantanugadgil commented Sep 22, 2022

tristanmorgan commented Jul 31, 2023

lgfa29 commented Jul 31, 2023

jorgemarey commented Nov 30, 2023

lgfa29 commented Dec 19, 2023

ag-TJNII commented Feb 26, 2024

tgross commented Jul 12, 2024

ketzacoatl commented Jul 16, 2024

Terraform fails to notice when a Nomad job has changed #1

Terraform fails to notice when a Nomad job has changed #1

Comments

hashibot commented Jun 13, 2017

Terraform Version

Affected Resource(s)

Terraform Configuration Files

Debug Output

Expected Behavior

Actual Behavior

Steps to Reproduce

ketzacoatl commented Feb 20, 2018 • edited Loading

ketzacoatl commented Feb 22, 2018

paddycarver commented Feb 22, 2018

ketzacoatl commented Mar 23, 2018

shantanugadgil commented Mar 31, 2018

ketzacoatl commented Apr 26, 2018 • edited Loading

kerscher commented May 3, 2018

Solution 1: add status computed metadata to the schema, plus #15 other items

Pros

Cons

Solution 2: Always de-register and re-register

Pros

Cons

ketzacoatl commented Jun 28, 2018 • edited Loading

ketzacoatl commented Jul 20, 2018

ketzacoatl commented Dec 6, 2018

cgbaker commented Dec 6, 2018

ketzacoatl commented Dec 11, 2018 • edited Loading

cgbaker commented Dec 20, 2018

ketzacoatl commented Apr 4, 2019

cgbaker commented Apr 4, 2019 via email

ketzacoatl commented Apr 11, 2019

cgbaker commented Jun 5, 2019

ketzacoatl commented Sep 13, 2019

liemle3893 commented Jul 14, 2020

ketzacoatl commented Jul 14, 2020

cgbaker commented Jul 14, 2020

eliburke commented Jul 8, 2022

lgfa29 commented Jul 15, 2022

shantanugadgil commented Sep 22, 2022

tristanmorgan commented Jul 31, 2023

lgfa29 commented Jul 31, 2023

jorgemarey commented Nov 30, 2023

lgfa29 commented Dec 19, 2023

ag-TJNII commented Feb 26, 2024

tgross commented Jul 12, 2024

ketzacoatl commented Jul 16, 2024

ketzacoatl commented Feb 20, 2018 •

edited

Loading

ketzacoatl commented Apr 26, 2018 •

edited

Loading

Solution 1: add `status` computed metadata to the schema, plus #15 other items

ketzacoatl commented Jun 28, 2018 •

edited

Loading

ketzacoatl commented Dec 11, 2018 •

edited

Loading