Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws:batch:JobDefinition Resource provider reported that the resource did not exist while updating #3845

Closed
jamie1911 opened this issue Apr 18, 2024 · 8 comments
Assignees
Labels
impact/reliability Something that feels unreliable or flaky kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Milestone

Comments

@jamie1911
Copy link
Contributor

jamie1911 commented Apr 18, 2024

What happened?

It seems something has changed and now Job Definitions for AWS batch seem to be getting in a strange state.

We get errors like:

Diagnostics:
  aws:batch:JobDefinition (prod_compxxSSSSJobDefinition):
    error: Resource provider reported that the resource did not exist while updating urn:pulumi:prod::CompWorkflows::aws:batch/jobDefinition:JobDefinition::prod_compxxSSSSJobDefinition.
    
    This is usually a result of the resource having been deleted outside of Pulumi, and can often be fixed by running `pulumi refresh` before updating.

However, the Job Definitions do exist. What actually ends up happening is now there are duplicate job definitions with multiple revisions. Previously, it would unregister the old job definition and create a new revision.

Example

amber_job_definition = aws.batch.JobDefinition(
        f"{prefix}_job_definition",
        name=f"{app_prefix}_job_definition",
        retry_strategy=JobDefinitionRetryStrategy(attempts=batch_job_retry_attempts),
        type="container",
        timeout=JobDefinitionTimeout(attempt_duration_seconds=batch_job_timeout),
        container_properties=amber_container_properties,
        opts=pulumi.ResourceOptions(provider=provider),
    )

Output of pulumi about

CLI          
Version      3.113.1
Go Version   go1.22.2
Go Compiler  gc

Plugins
NAME             VERSION
aws              6.31.0
aws-native       0.102.0
docker-buildkit  0.1.27
gitlab           6.10.0
pulumi_docker    3.1.0
python           unknown

Host     
OS       ubuntu
Version  22.04
Arch     x86_64

Additional context

It seems either retrying the GitLab job will run successfully. Or it will run successfully if we first run refresh then retrying the job. However, any job after will work 1 time then will fail until the steps are repeated.

Contributing

Vote on this issue by adding a 👍 reaction.
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

@jamie1911 jamie1911 added kind/bug Some behavior is incorrect or out of spec needs-triage Needs attention from the triage team labels Apr 18, 2024
@corymhall corymhall added this to the 0.104 milestone Apr 19, 2024
@corymhall corymhall removed the needs-triage Needs attention from the triage team label Apr 19, 2024
@corymhall
Copy link
Contributor

@jamie1911 thanks for reporting this issue. We'll need to do some research on this to figure out a root cause.

@corymhall corymhall added the impact/reliability Something that feels unreliable or flaky label Apr 19, 2024
@jamie1911
Copy link
Contributor Author

I have found the issue here.

#3792 increased the provider to https://github.com/hashicorp/terraform-provider-aws/releases/tag/v5.44.0

which has a new parameter deregister_on_new_revision on resource aws:batch/jobDefinition:JobDefinition

Which doesn't seem to be working or should be documented correctly or defaulted to the previous behavior.

Any advice?

@corymhall corymhall self-assigned this Apr 25, 2024
@corymhall
Copy link
Contributor

@jamie1911 it looks like this issue is being tracked upstream by hashicorp/terraform-provider-aws#36824.

The issue is marked prioritized and has been assigned 2 days ago so it looks like there is a good chance it will be fixed upstream soon. I would prefer to wait a little to see if we get an upstream fix, but if this is needed urgently then we can try and patch it on our side.

@corymhall corymhall added the awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). label Apr 25, 2024
@jamie1911
Copy link
Contributor Author

@jamie1911 it looks like this issue is being tracked upstream by hashicorp/terraform-provider-aws#36824.

The issue is marked prioritized and has been assigned 2 days ago so it looks like there is a good chance it will be fixed upstream soon. I would prefer to wait a little to see if we get an upstream fix, but if this is needed urgently then we can try and patch it on our side.

Thanks, I am personally okay waiting for an upstream fix, I have reverted our process to pin to pulumi-aws==6.29.1

@corymhall
Copy link
Contributor

The fix was just released in the latest version of Terraform, so this should be fixed in our next release.

@t0yv0
Copy link
Member

t0yv0 commented May 2, 2024

Unfortunately inheriting the fix has opened up another regression over BatchDefinition. I believe we have a fix here: #3888 which will be releasing shortly. Appreciate your patience.

@t0yv0
Copy link
Member

t0yv0 commented May 3, 2024

This should be fixed in the latest release. Please et us know if there are any remaining problems in this area.

@t0yv0 t0yv0 closed this as completed May 3, 2024
@pulumi-bot
Copy link
Contributor

Cannot close issue:

  • does not have required labels: resolution/

Please fix these problems and try again.

@pulumi-bot pulumi-bot reopened this May 3, 2024
@t0yv0 t0yv0 added resolution/fixed This issue was fixed and removed awaiting-upstream The issue cannot be resolved without action in another repository (may be owned by Pulumi). labels May 3, 2024
@t0yv0 t0yv0 closed this as completed May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/reliability Something that feels unreliable or flaky kind/bug Some behavior is incorrect or out of spec resolution/fixed This issue was fixed
Projects
None yet
Development

No branches or pull requests

4 participants