Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

timeout while waiting for state to become 'success' (timeout: 2m0s) #760 #765

Closed
eitah opened this issue Nov 3, 2023 · 9 comments · Fixed by #777
Closed

timeout while waiting for state to become 'success' (timeout: 2m0s) #760 #765

eitah opened this issue Nov 3, 2023 · 9 comments · Fixed by #777

Comments

@eitah
Copy link

eitah commented Nov 3, 2023

There was a patch predicted to address the timeouts but it does not appear to have been successful.

The PR, #763, seems to address the issue by making changes to the EVENTS API but the timeout we suspect is in the REST API (reference: https://developer.pagerduty.com/docs/ZG9jOjExMDI5NTUz-rate-limiting)

For full issue history, see #760

@vperaltac
Copy link

Any updates on this? I am hitting the same error even though the API key is far from reaching the rate limit.

@erose96
Copy link

erose96 commented Nov 21, 2023

I am hitting this same issue

@YaffleZ
Copy link

YaffleZ commented Nov 24, 2023

Is the team aware of this issue and actively working on it? Our pipeline is broken because of this bug for weeks now. Appreciate if somebody can look into it.

@erose96
Copy link

erose96 commented Nov 27, 2023

seems like it might be related to how the terraform SDK handles retries [related issue]

I'm receiving the same WaitForState errors the users in the issue above (and related issues) are seeing:

2023-11-27T22:16:04.626Z [WARN]  provider.terraform-provider-pagerduty_v3.1.2: WaitForState timeout after 2m0s: timestamp=2023-11-27T22:16:04.626Z
2023-11-27T22:16:04.626Z [WARN]  provider.terraform-provider-pagerduty_v3.1.2: WaitForState starting 30s refresh grace period: timestamp=2023-11-27T22:16:04.626Z
2023-11-27T22:16:34.628Z [ERROR] provider.terraform-provider-pagerduty_v3.1.2: WaitForState exceeded refresh grace period: timestamp=2023-11-27T22:16:34.626Z
2023-11-27T22:16:34.628Z [ERROR] vertex "module.<pagerduty_service_module>" error: timeout while waiting for state to become 'success' (timeout: 2m0s)
2023-11-27T22:16:34.628Z [ERROR] vertex "module.<pagerduty_service_integration> (expand)" error: timeout while waiting for state to become 'success' (timeout: 2m0s)

A fix has been submitted upstream to the SDK but has not been reviewed for nearly two years.

@danekantner
Copy link

This is causing major negatives including data loss

@jtsaito
Copy link
Contributor

jtsaito commented Nov 30, 2023

This may be related to the new change in API limits. The best practice documented by PagerDuty should be implemented.

@erose96
Copy link

erose96 commented Nov 30, 2023

You can check that by turning on debug logging before running your terraform. None of the API calls which timed out for me were even close to hitting the rate limit. Agreed that best practices outlined by PagerDuty should be followed.

@danekantner
Copy link

I am sure we have confirmed it isn't 429s already but the code is already aware of 429s and does a retry it looks like. If this were happening because of throttling the 429 shouldn't be hidden in the debug log, It's already giving an error, it should be the right error. Giving the actual error message for failure is pretty standard to present without requiring debugging on.

I'm going to just delete the alerting entirely today b/c I have deadlines.

@erose96
Copy link

erose96 commented Dec 4, 2023

@imjaroiswebdev I believe this issue is still present in both 3.2.1 and 3.2.2

I have not had any failures due to rate limiting so I'm not surprised that a fix only focused on that had no change to the behavior I and others are seeing.

2023-12-04T16:12:02.714Z [WARN]  unexpected data:
  registry.terraform.io/pagerduty/pagerduty:stderr=
  | {"@caller":"github.com/hashicorp/terraform-plugin-sdk/[email protected]/internal/logging/helper_schema.go:21","@level":"trace","@message":"Calling downstream","@module":"sdk.helper_schema","@timestamp":"2023-12-04T16:12:02.713773Z"}
  | {"@caller":"github.com/hashicorp/terraform-plugin-sdk/[email protected]/internal/logging/helper_schema.go:21","@level":"trace","@message":"Called downstream","@module":"sdk.helper_schema","@timestamp":"2023-12-04T16:12:02.713833Z"}
  
2023-12-04T16:13:52.730Z [WARN]  provider.terraform-provider-pagerduty_v3.2.2: WaitForState timeout after 2m0s: timestamp=2023-12-04T16:13:52.730Z
2023-12-04T16:13:52.730Z [WARN]  provider.terraform-provider-pagerduty_v3.2.2: WaitForState starting 30s refresh grace period: timestamp=2023-12-04T16:13:52.730Z
2023-12-04T16:13:57.308Z [WARN]  provider.terraform-provider-pagerduty_v3.2.2: WaitForState timeout after 2m0s: timestamp=2023-12-04T16:13:57.308Z
2023-12-04T16:13:57.308Z [WARN]  provider.terraform-provider-pagerduty_v3.2.2: WaitForState starting 30s refresh grace period: timestamp=2023-12-04T16:13:57.308Z
2023-12-04T16:13:57.524Z [WARN]  provider.terraform-provider-pagerduty_v3.2.2: WaitForState timeout after 2m0s: timestamp=2023-12-04T16:13:57.523Z
2023-12-04T16:13:57.524Z [WARN]  provider.terraform-provider-pagerduty_v3.2.2: WaitForState starting 30s refresh grace period: timestamp=2023-12-04T16:13:57.524Z
2023-12-04T16:14:22.732Z [ERROR] provider.terraform-provider-pagerduty_v3.2.2: WaitForState exceeded refresh grace period: timestamp=2023-12-04T16:14:22.731Z
2023-12-04T16:14:22.732Z [ERROR] vertex "module.{pagerduty_service_name}" error: timeout while waiting for state to become 'success' (timeout: 2m0s)
2023-12-04T16:14:22.733Z [ERROR] vertex "module.{pagerduty_service_name} (expand)" error: timeout while waiting for state to become 'success' (timeout: 2m0s)
2023-12-04T16:14:22.733Z [WARN]  unexpected data: registry.terraform.io/pagerduty/pagerduty:stderr="{\"@caller\":\"github.com/hashicorp/terraform-plugin-sdk/[email protected]/internal/logging/helper_schema.go:21\",\"@level\":\"trace\",\"@message\":\"Called downstream\",\"@module\":\"sdk.helper_schema\",\"@timestamp\":\"2023-12-04T16:14:22.731444Z\"}"

The last 200 I received before this showed the following retry headers:

Ratelimit-Limit: 960
Ratelimit-Remaining: 919
Ratelimit-Reset: 58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants