From e34b93975233feafe80da522777bb2d6f58c3c0d Mon Sep 17 00:00:00 2001 From: Masayuki Morita Date: Thu, 2 Mar 2023 15:47:36 +0900 Subject: [PATCH] Export TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE=true in CI Starting from Terraform v1.4, launching terraform providers in the acceptance test has been failing more frequently with a text file busy error. ``` --- FAIL: TestAccMultiStateMigratorApplySimple (1.07s) multi_state_migrator_test.go:123: failed to run terraform init: failed to run command (exited 1): terraform init -input=false -no-color stdout: Initializing the backend... Successfully configured the backend "s3"! Terraform will automatically use this backend unless the backend configuration changes. Initializing provider plugins... - Finding latest version of hashicorp/null... - Installing hashicorp/null v3.2.1... stderr: Error: Failed to install provider Error while installing hashicorp/null v3.2.1: open /tmp/plugin-cache/registry.terraform.io/hashicorp/null/3.2.1/linux_amd64/terraform-provider-null_v3.2.1_x5: text file busy ``` After some investigation, I found Go's `os/exec.Cmd.Run()` does not wait for the grandchild process to complete; from the point of view of tfmigrate, the terraform command is the child process, and the provider is the grandchild process. https://github.com/golang/go/issues/23019 If I understand correctly, this is not a Terraform issue and theoretically should occur in versions older than v1.4; the changes in v1.4 may have broken the balance of execution timing and made the test very flaky. I experimented with inserting some sleep but could not get the test to stabilize correctly. After trying various things, I found that the test became stable by enabling the `TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE` flag was introduced in v1.4. This is an escape hatch to revert to the v1.3 equivalent of the global cache behavior change in v1.4. https://github.com/hashicorp/terraform/pull/32726 This behavior change has already been addressed in the previous commit using a local file system mirror, so activating this flag does not seem to make any sense. Even though I have no other reasonable solutions now, please let me know if anyone finds a better solution. --- docker-compose.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/docker-compose.yml b/docker-compose.yml index 8186805..1657c6b 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -14,6 +14,7 @@ services: # Use the same filesystem to avoid a checksum mismatch error # or a file busy error caused by asynchronous IO. TF_PLUGIN_CACHE_DIR: "/tmp/plugin-cache" + TF_PLUGIN_CACHE_MAY_BREAK_DEPENDENCY_LOCK_FILE: "true" # From observation, although we don’t have complete confidence in the root cause, # it appears that localstack sometimes misses API requests when run in parallel. TF_CLI_ARGS_apply: "--parallelism=1"