[Metricbeat][Azure] Azure monitor/storage: Fix 429 Too Many Requests error #38294

constanca-m · 2024-03-13T08:50:52Z

Proposed commit message

Please see the issue elastic/integrations#8575.

Metricsets affected by this change: storage and monitor.

This PR fixes the issue the following way:

We check if the error produced has the header "Retry After".
If this header is present then we use time.sleep for the seconds given by the header and try again after the time is up.
If the header is not present, then it is likely another error and we return.

This is also the retry suggestion given by Retry Usage Guidance from Azure.

For monitor metricset, the previous implementation also had a different problem affecting the number of calls: we were trying to get the metrics definition for every metric, without checking if we had done that already for the combination namespace + resource ID. So, for example, this configuration:

    - resource_query: "resourceType eq 'Microsoft.ContainerRegistry/registries'"
      metrics:
        - name: [ "TotalPullCount", "SuccessfulPullCount", "TotalPushCount", "SuccessfulPushCount","RunDuration", "AgentPoolCPUTime" ]
          namespace: "Microsoft.ContainerRegistry/registries"
          timegrain: "PT1M"
        - name: [ "StorageUsed" ]
          namespace: "Microsoft.ContainerRegistry/registries"
          timegrain: "PT1H"
          dimensions:
            - name: "Geolocation"
              value: "*"

Would lead to 2 calls for each resource, even though the namespace is the same for every metric group. This is also fixed in this PR.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

Enable azure monitor metricset, like in this example configuration.

To reproduce the error you can change the code a bit to force it to make many requests. You can put the getMetricsDefinitionsWithRetry inside a for cycle for that.

I added some printing lines to the code. You should see an output like this:

[...]
>> Trying request  400
>> Trying request  401
>> Trying request  402
>> Metricbeat will sleep for 309 seconds and retry again.
>> Trying request  402  <---------- Same request as before
>> Trying request  403
[...]

Please notice that running this takes around 7 minutes to first encounter the error. After that time you have to wait for the sleep time, and only then you start making requests again.

Related issues

Closes elastic/integrations#8575.

elasticmachine · 2024-03-13T08:57:56Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Duration: 54 min 12 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages and run the E2E tests.
/beats-tester : Run the installation tests with beats-tester.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

zmoog

Honoring the HTTP Retry-After header is an excellent approach to this problem.

The effect is that the metricset sleeps for 5 minutes, and this may have several consequences:

This may interfere if Metricbeat needs to reload the metricset while the goroutine is blocked sleeping.
If the metricset collects metrics with PT1M timegrain, we may not have any metrics for 5 minutes or more.

We need to find an answer to these questions:

What's the rate limit for the "get metric definitions" endpoint?
The metricset also calls other endpoints (for example, to get the metric values): do we also need to manage the retry for them, and what is the rate limit for that endpoint?

If we know how many resources the metricset can realistically collect metrics for in 1 to 5 minutes, we can at least log a warning message saying that there are too many resources to honor the time grain.

Have you tried to run Metricbeat on a real Azure account with enough resources to trigger the throttling?

constanca-m · 2024-03-13T10:02:58Z

What's the rate limit for the "get metric definitions" endpoint?

I cannot find this limit anywhere, I believe it is something that probably changes according to the subscription tier.

This may interfere if Metricbeat needs to reload the metricset while the goroutine is blocked sleeping.

We can also add a new configuration option on whether the customer wants to try again or not, and this way it is not blocked if they don't want to.

x-pack/metricbeat/module/azure/monitor/client_helper.go

constanca-m · 2024-03-13T12:41:58Z

Update on testing:

For this test, I added 4 resources. Each of these resources produces 101 calls.

This means that we need to run 101 calls * 4 resources = 404 calls at minimum (if no retry happens).

But as we know, we cannot do 400 calls in 5 minutes, so by call 402 we get the error and metricbeat sleeps for 309 seconds (which is 5min9sec).

Only after that it runs the calls that are missing. So the function only finishes after 538seconds (in this case that I tested). This is 8.96 minutes to finish the 404 calls (plus 1 retry call). Then we get the documents.

So in this scenario, for each cycle, we are taking 9 minutes...

zmoog · 2024-03-13T17:19:29Z

Update from offline conversations during testing this PR, where we learned that the metrics definition phase could be more efficient:

For example, if Metricbeat uses a configuration with three metric groups using the same namespace, like the following:

      resources:
        - metrics:
            - ignore_unsupported: true
              name:
                - ServiceApiLatency
                - Availability
                - ServiceApiResult
              namespace: Microsoft.MyNamespace/MyResourceType
              timegrain: PT1M
            - ignore_unsupported: true
              name:
                - ServiceApiHit
              namespace: Microsoft.MyNamespace/MyResourceType
              timegrain: PT1M
            - ignore_unsupported: true
              name:
                - SaturationShoebox
              namespace: Microsoft.MyNamespace/MyResourceType
              timegrain: PT1M
          resource_query: resourceType eq 'Microsoft.MyNamespace/MyResourceType'

The mapMetrics() helper would call client.AzureMonitorService.GetMetricDefinitions() for all resource for each metrics group:

beats/x-pack/metricbeat/module/azure/monitor/client_helper.go

Lines 21 to 29 in 8e790de

    
           func mapMetrics(client *azure.Client, resources []*armresources.GenericResourceExpanded, resourceConfig azure.ResourceConfig) ([]azure.Metric, error) { 
        
           	var metrics []azure.Metric 
        
           	for _, resource := range resources { 
        
           		for _, metric := range resourceConfig.Metrics { 
        
           			// get all metrics supported by the namespace provided 
        
           			metricDefinitions, err := client.AzureMonitorService.GetMetricDefinitions(*resource.ID, metric.Namespace) 
        
           			if err != nil { 
        
           				return nil, fmt.Errorf("no metric definitions were found for resource %s and namespace %s %w", *resource.ID, metric.Namespace, err) 
        
           			}

With the current implementation, 10 resources and 3 groups (same namespace), would require 30 API calls.

Instead, it should call client.AzureMonitorService.GetMetricDefinitions() only once for each unique combination of resource and namespace.

As a proof of concept, I tried to see if we can only get the metrics definition once (untested):

// mapMetrics should validate and map the metric related configuration to relevant azure monitor api parameters
func mapMetrics(client *azure.Client, resources []*armresources.GenericResourceExpanded, resourceConfig azure.ResourceConfig) ([]azure.Metric, error) {
	var metrics []azure.Metric

	// index to find the unique combination of resource and metric namespace
	index := map[string]struct {
		Resource                   *armresources.GenericResourceExpanded
		MetricConfig               *azure.MetricConfig
		MetricDefinitionCollection *armmonitor.MetricDefinitionCollection
	}{}

	// group resources and resourceConfig by resource.ID and metric.Namespace
	for _, resource := range resources {
		for _, metric := range resourceConfig.Metrics {
			key := *resource.ID + metric.Namespace
			if _, exists := index[key]; exists {
				// we already have the metric definitions for this resource and namespace
				continue
			}

			metricDefinitions, err := client.AzureMonitorService.GetMetricDefinitions(*resource.ID, metric.Namespace)
			if err != nil {
				return nil, fmt.Errorf("no metric definitions were found for resource %s and namespace %s %w", *resource.ID, metric.Namespace, err)
			}

			index[key] = struct {
				Resource                   *armresources.GenericResourceExpanded
				MetricConfig               *azure.MetricConfig
				MetricDefinitionCollection *armmonitor.MetricDefinitionCollection
			}{
				Resource:                   resource,
				MetricConfig:               &metric,
				MetricDefinitionCollection: &metricDefinitions,
			}
		}
	}

	for _, resource := range resources {
		for _, metric := range resourceConfig.Metrics {
			//get all metrics supported by the namespace provided
			//metricDefinitions, err := client.AzureMonitorService.GetMetricDefinitions(*resource.ID, metric.Namespace)
			//if err != nil {
			//	return nil, fmt.Errorf("no metric definitions were found for resource %s and namespace %s %w", *resource.ID, metric.Namespace, err)
			//}

			// lookup the metric definitions for the resource and namespace
			key := *resource.ID + metric.Namespace
			if _, ok := index[key]; !ok {
				// this should never happen
				return nil, fmt.Errorf("no metric definitions were found for resource %s and namespace %s", *resource.ID, metric.Namespace)
			}

			metricDefinitions := index[key].MetricDefinitionCollection

With an implementation that identifies unique combinations of resource and namespace, 10 resources and 3 groups (same namespace), would require 10 API calls.

zmoog · 2024-03-13T17:33:35Z

Only after that it runs the calls that are missing. So the function only finishes after 538seconds (in this case that I tested). This is 8.96 minutes to finish the 404 calls (plus 1 retry call). Then we get the documents.

So in this scenario, for each cycle, we are taking 9 minutes...

While the goroutine sleeps for 309 seconds, the metricset is probably not collecting metric values, right?

Sleeping for so long is a problem for integrations using metrics with PT1M timegrain, where users expect one document for each resource every 60s.

But I'm not 100% sure, and we need to check. @constanca-m, could you see gaps in the metrics when the time grain is PT1M?

constanca-m · 2024-03-14T08:42:28Z

Only after that it runs the calls that are missing. So the function only finishes after 538seconds (in this case that I tested). This is 8.96 minutes to finish the 404 calls (plus 1 retry call). Then we get the documents.
So in this scenario, for each cycle, we are taking 9 minutes...

While the goroutine sleeps for 309 seconds, the metricset is probably not collecting metric values, right?

Sleeping for so long is a problem for integrations using metrics with PT1M timegrain, where users expect one document for each resource every 60s.

But I'm not 100% sure, and we need to check. @constanca-m, could you see gaps in the metrics when the time grain is PT1M?

Yes, I was already seeing that before. I am only making 100 calls, so no timeout and no sleep. And Discover looks like this:

The configuration I am using is:

- module: azure
  metricsets:
  - monitor
  enabled: true
  period: 60s

  refresh_list_interval: 60s
  resources:
    - resource_query: "resourceType eq 'Microsoft.ContainerRegistry/registries'"
      metrics:
        - name: [ "TotalPullCount", "SuccessfulPullCount", "TotalPushCount", "SuccessfulPushCount","RunDuration", "AgentPoolCPUTime" ]
          namespace: "Microsoft.ContainerRegistry/registries"
          timegrain: "PT1M"

constanca-m · 2024-03-14T08:57:40Z

I want to add that even with just 100 calls, we can reproduce the 429 error. After running it 6 times (I think it was 6), the 7th blocked because of the 429 error, so it is still very present here.

Results:

I believe the big time gap is when metricbeat sleep occured. The amount of documents being received remains the same. I don't think we are losing data, it makes sense to have this huge gap since the API was not available for requests anymore.

zmoog · 2024-03-14T09:00:52Z

Good start.

How Discover looks like with this configuration (this matches the config a user reported)?

- module: azure
  metricsets:
  - monitor
  enabled: true
  period: 60s

  refresh_list_interval: 60s
  resources:
    - resource_query: "resourceType eq 'Microsoft.ContainerRegistry/registries'"
      metrics:
        - name: [ "TotalPullCount", "SuccessfulPullCount", "TotalPushCount", "SuccessfulPushCount","RunDuration", "AgentPoolCPUTime" ]
          namespace: "Microsoft.ContainerRegistry/registries"
          timegrain: "PT1M"
        - name: [ "TotalPullCount", "SuccessfulPullCount", "TotalPushCount", "SuccessfulPushCount","RunDuration", "AgentPoolCPUTime" ]
          namespace: "Microsoft.ContainerRegistry/registries"
          timegrain: "PT1M"
        - name: [ "TotalPullCount", "SuccessfulPullCount", "TotalPushCount", "SuccessfulPushCount","RunDuration", "AgentPoolCPUTime" ]
          namespace: "Microsoft.ContainerRegistry/registries"
          timegrain: "PT1M"

(in a real config, the multiple metric groups would have different dimensions, but this should have the same effect)

Also we need to measure:

how many calls it makes
does it triggers 429 error?
how many seconds it sleeps (if any)

zmoog · 2024-03-14T09:05:21Z

I don't think we are losing data, it makes sense to have this huge gap since the API was not available for requests anymore.

With PT1M metrics, the users expectation is to have one measure every one minute. This is not happening.

zmoog · 2024-03-14T09:16:49Z

My take on this issue.

The current implementation limits the number of resources we can successfully collect PT5M (one document every 5 minutes) and PT1M (one document every 1 minute) metrics.

So, I guess we need to:

Identify the maximum number of Azure resources that allow a complete collection of PT5M and PT1M metrics.
Eliminate all inefficiencies that do not require rewriting a large portion of the metricset.
Document the maximum number of Azure resources supported for complete collection. Above this limit, the metric collection will be degraded (it will miss some measures).
Plan more significant changes for the next Metricbeat release.

@axw, I'd love to hear what you think! 🙇

constanca-m · 2024-03-14T09:24:44Z

Update on comment #38294 (comment).

The results of 3 sleeps are:

I will now test scenario from comment #38294 (comment)

zmoog · 2024-03-14T09:27:35Z

Plan more significant changes for the next Metricbeat release.

For example, we should fetch async and cache the metric definitions, like the otel azuremonitorreceiver does.

constanca-m · 2024-03-14T10:12:54Z

Following #38294 (comment)

I expected as well the same results, but we don't see them.

how many calls it makes

As expected, it makes 100 calls as well. This is the same.

does it triggers 429 error?

It does not. This seems to be unexpected. Check the results in Discover:

We can see that the results from 100 calls are coming this time every 3 minutes. So in 5 minutes, we cannot make more than 400 calls / 5 minutes, and this is why we don't see the error 429. The main question then is: what is defining this period between calls? Why do we have 3 minutes per each call, and before that we had 2 min? Also, why do we have a bigger gap between first call and second call, which also happened before?

Another detail: running with this configuration takes around 50s to complete the 100 calls. Running with the previous one #38294 (comment) takes around 50s as well, so the cause is not in the time it takes to complete all the calls.

The configuration used is:

- module: azure
  metricsets:
  - monitor
  enabled: true
  period: 60s

  refresh_list_interval: 60s
  resources:
    - resource_query: "resourceType eq 'Microsoft.ContainerRegistry/registries'"
      metrics:
        - name: [ "TotalPullCount", "SuccessfulPullCount", "TotalPushCount", "SuccessfulPushCount","RunDuration", "AgentPoolCPUTime" ]
          namespace: "Microsoft.ContainerRegistry/registries"
          timegrain: "PT1M"
        - name: [ "TotalPullCount", "SuccessfulPullCount", "TotalPushCount", "SuccessfulPushCount","RunDuration", "AgentPoolCPUTime" ]
          namespace: "Microsoft.ContainerRegistry/registries"
          timegrain: "PT1M"
        - name: [ "TotalPullCount", "SuccessfulPullCount", "TotalPushCount", "SuccessfulPushCount","RunDuration", "AgentPoolCPUTime" ]
          namespace: "Microsoft.ContainerRegistry/registries"
          timegrain: "PT1M"

zmoog · 2024-03-14T10:42:02Z

how many calls it makes

As expected, it makes 100 calls as well. This is the same.

Great, I see the code is now caching the metrics definitions for each namespace.

We can see that the results from 100 calls are coming this time every 3 minutes. So in 5 minutes, we cannot make more than 400 calls / 5 minutes, and this is why we don't see the error 429. The main question then is: what is defining this period between calls? Why do we have 3 minutes per each call, and before that we had 2 min? Also, why do we have a bigger gap between first call and second call, which also happened before?

Uhm, I don't have an convincing explanation for this.

Between the "get metrics definition" calls, we have the other calls to get the metric values. I can only think of different network conditions (latency, bandwidth) causing these differences.

x-pack/metricbeat/module/azure/monitor/client_helper.go

x-pack/metricbeat/module/azure/monitor_service.go

zmoog · 2024-03-14T15:41:06Z

We should also write a KB article describing the current limits.

In particular, we need to collect these information:

Maximum number of resources without throttling using PT1M metrics.
Maximum number of resources without throttling using PT5M metrics.

I guess this number may be different because PT1M requires collecting metrics every 60s, probably causing the throttling with less resources.

Do we already have this information from previous tests, or we need to perform new ad-hoc ones?

zmoog · 2024-03-18T12:56:42Z

x-pack/metricbeat/module/azure/monitor_service.go

+		for {
+			nextPage, err := pager.NextPage(service.context)
+			if err != nil {
+				retryError := service.sleepIfPossible(err, resourceId, namespace)
+				if retryError != nil {
+					return armmonitor.MetricDefinitionCollection{}, err
+				}
+			} else {
+				metricDefinitionCollection.Value = append(metricDefinitionCollection.Value, nextPage.Value...)
+				break
+			}
+		}


Are these for loop and else clause required?

Suppose the result for a resource contains two pages and:

the call to pager.NextPage() succeedes on the first page.

the call to pager.NextPage() fails on the second page with the 429 error.

the goroutine sleeps for 5 mins

At this point, we haven't successfully gotten page 2; would the pager.More() return true or false?

If it returns true, we don't need the inner for loop.

If true, we could use something like:

for pager.More() { nextPage, err := pager.NextPage(service.context) if err != nil { retryError := service.sleepIfPossible(err, resourceId, namespace) if retryError != nil { return armmonitor.MetricDefinitionCollection{}, err } continue } metricDefinitionCollection.Value = append(metricDefinitionCollection.Value, nextPage.Value...) }

WDYT?

Yes, that is a much better solution! I tried it, and the number of documents after sleep remains the same:

I also had some printing lines, and I could see the request was repeated.

zmoog · 2024-03-19T23:28:55Z

@constanca-m, I tested the metricset with 400 resources and PT5M, and it works reasonably well: it takes longer than 5 minutes when it reloads the metric definitions.

Here's the configuration I used:

- module: azure
  metricsets:
  - monitor
  enabled: true
  period: 300s
  refresh_list_interval: 600s
  resources:
  - resource_query: "resourceType eq 'Microsoft.ContainerRegistry/registries'"
    metrics:
      - name: [ "TotalPullCount", "SuccessfulPullCount", "TotalPushCount", "SuccessfulPushCount","RunDuration", "AgentPoolCPUTime" ]
        namespace: "Microsoft.ContainerRegistry/registries"
        timegrain: "PT5M"

On average, it takes ~190s to collect values for 400 metrics (0.5s for each metric).

So with the current implementation, we can collect no more than:

120 metric values with a PT1M time grain.
600 metric values with a PT5M time grain (not tested yet).

zmoog

We still need to improve the metricset's efficiency and performance, but this is a step in the right direction.

@constanca-m, thank you for your effort and patience.

Signed-off-by: constanca <[email protected]>

elasticmachine · 2024-03-19T23:36:29Z

💚 Build Succeeded

Buildkite Build
Commit: 0d673a8

History

💚 Build #1070 succeeded 3576ea1
💚 Build #992 succeeded 9e93b11
💚 Build #968 succeeded 7f95a32
💚 Build #961 succeeded 84932d9

cc @constanca-m

elasticmachine · 2024-03-19T23:36:33Z

💚 Build Succeeded

Buildkite Build
Commit: 0d673a8

History

💚 Build #1508 succeeded 3576ea1
💚 Build #1430 succeeded 9e93b11
💚 Build #1406 succeeded 7f95a32
💚 Build #1399 succeeded 84932d9
💚 Build #1314 succeeded b4b1dbc
💚 Build #1251 succeeded 0d5b521

cc @constanca-m

elasticmachine · 2024-03-19T23:36:38Z

💚 Build Succeeded

Buildkite Build
Commit: 0d673a8

History

💚 Build #1115 succeeded 3576ea1
💔 Build #1037 failed 9e93b11
💔 Build #1013 failed 7f95a32
💔 Build #1006 failed 84932d9

cc @constanca-m

elasticmachine · 2024-03-19T23:36:45Z

💚 Build Succeeded

Buildkite Build
Commit: 0d673a8

History

💚 Build #1073 succeeded 3576ea1
💔 Build #995 failed 9e93b11
💔 Build #971 failed 7f95a32
💔 Build #964 failed 84932d9

cc @constanca-m

elasticmachine · 2024-03-19T23:36:46Z

💚 Build Succeeded

Buildkite Build
Commit: 0d673a8

History

💚 Build #2345 succeeded 3576ea1
💚 Build #2267 succeeded 9e93b11
💚 Build #2243 succeeded 7f95a32
💚 Build #2236 succeeded 84932d9
💚 Build #2151 succeeded b4b1dbc
💚 Build #2088 succeeded 0d5b521

cc @constanca-m

elasticmachine · 2024-03-19T23:36:50Z

💚 Build Succeeded

Buildkite Build
Commit: 0d673a8

History

💚 Build #3563 succeeded 3576ea1
💚 Build #3485 succeeded 9e93b11
💚 Build #3461 succeeded 7f95a32
💚 Build #3454 succeeded 84932d9
💚 Build #3369 succeeded b4b1dbc
💚 Build #3302 succeeded 0d5b521

cc @constanca-m

elasticmachine · 2024-03-19T23:36:53Z

💚 Build Succeeded

Buildkite Build
Commit: 0d673a8

History

💚 Build #2352 succeeded 3576ea1
💚 Build #2274 succeeded 9e93b11
💚 Build #2250 succeeded 7f95a32
💚 Build #2243 succeeded 84932d9
💚 Build #2158 succeeded b4b1dbc
💚 Build #2095 succeeded 0d5b521

cc @constanca-m

elasticmachine · 2024-03-19T23:40:09Z

💔 Build Failed

Buildkite Build
Commit: 0d673a8

Failed CI Steps

History

💔 Build #3995 failed 3576ea1
💔 Build #3917 failed 9e93b11
💔 Build #3892 failed 7f95a32
💔 Build #3885 failed 84932d9

cc @constanca-m

elasticmachine · 2024-03-19T23:40:23Z

💔 Build Failed

Buildkite Build
Commit: 0d673a8

Failed CI Steps

History

💔 Build #2642 failed 3576ea1
💔 Build #2564 failed 9e93b11
💔 Build #2540 failed 7f95a32
💔 Build #2533 failed 84932d9

cc @constanca-m

elasticmachine · 2024-03-20T00:19:32Z

💚 Build Succeeded

Buildkite Build
Commit: 0d673a8

History

💚 Build #1083 succeeded 3576ea1
💚 Build #1005 succeeded 9e93b11
💚 Build #981 succeeded 7f95a32
💔 Build #974 failed 84932d9

cc @constanca-m

…error (#38294) * Fiz azure too many requests error Signed-off-by: constanca <[email protected]> * Add CHANGELOG.next.asciidoc entry Signed-off-by: constanca <[email protected]> * Run mage check Signed-off-by: constanca <[email protected]> * Get metrics 1 time for each namespace/id Signed-off-by: constanca <[email protected]> * Add error comparison Signed-off-by: constanca <[email protected]> * Move retry to monitor service Signed-off-by: constanca <[email protected]> * Remove GetMetricDefinitions without retry Signed-off-by: constanca <[email protected]> * fix storage unit tests Signed-off-by: constanca <[email protected]> * fix monitor unit tests Signed-off-by: constanca <[email protected]> * Remove unnecessary for cycle Signed-off-by: constanca <[email protected]> --------- Signed-off-by: constanca <[email protected]> (cherry picked from commit 10ff992)

…error (#38294) (#38457) * Fiz azure too many requests error Signed-off-by: constanca <[email protected]> * Add CHANGELOG.next.asciidoc entry Signed-off-by: constanca <[email protected]> * Run mage check Signed-off-by: constanca <[email protected]> * Get metrics 1 time for each namespace/id Signed-off-by: constanca <[email protected]> * Add error comparison Signed-off-by: constanca <[email protected]> * Move retry to monitor service Signed-off-by: constanca <[email protected]> * Remove GetMetricDefinitions without retry Signed-off-by: constanca <[email protected]> * fix storage unit tests Signed-off-by: constanca <[email protected]> * fix monitor unit tests Signed-off-by: constanca <[email protected]> * Remove unnecessary for cycle Signed-off-by: constanca <[email protected]> --------- Signed-off-by: constanca <[email protected]> (cherry picked from commit 10ff992) Co-authored-by: Constança Manteigas <[email protected]>

constanca-m added Team:Cloud-Monitoring Label for the Cloud Monitoring team backport-v8.13.0 Automated backport with mergify labels Mar 13, 2024

constanca-m self-assigned this Mar 13, 2024

constanca-m requested a review from a team as a code owner March 13, 2024 08:50

botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 13, 2024

constanca-m added the bugfix label Mar 13, 2024

constanca-m requested a review from zmoog March 13, 2024 09:01

zmoog reviewed Mar 13, 2024

View reviewed changes

gizas reviewed Mar 13, 2024

View reviewed changes

x-pack/metricbeat/module/azure/monitor/client_helper.go Outdated Show resolved Hide resolved

gizas reviewed Mar 13, 2024

View reviewed changes

x-pack/metricbeat/module/azure/monitor/client_helper.go Outdated Show resolved Hide resolved

gizas reviewed Mar 13, 2024

View reviewed changes

x-pack/metricbeat/module/azure/monitor/client_helper.go Outdated Show resolved Hide resolved

zmoog mentioned this pull request Mar 14, 2024

[WIP] Use rate limiter with all interactions with Azure Monitoring API #38127

Closed

7 tasks

zmoog reviewed Mar 14, 2024

View reviewed changes

x-pack/metricbeat/module/azure/monitor/client_helper.go Outdated Show resolved Hide resolved

x-pack/metricbeat/module/azure/monitor/client_helper.go Show resolved Hide resolved

x-pack/metricbeat/module/azure/monitor/client_helper.go Outdated Show resolved Hide resolved

zmoog reviewed Mar 14, 2024

View reviewed changes

x-pack/metricbeat/module/azure/monitor_service.go Outdated Show resolved Hide resolved

zmoog reviewed Mar 18, 2024

View reviewed changes

zmoog force-pushed the azure-too-many-requests branch from 9e93b11 to 3576ea1 Compare March 19, 2024 15:36

zmoog approved these changes Mar 19, 2024

View reviewed changes

constanca-m added 10 commits March 20, 2024 00:33

Fiz azure too many requests error

39020f7

Signed-off-by: constanca <[email protected]>

Add CHANGELOG.next.asciidoc entry

62aab61

Signed-off-by: constanca <[email protected]>

Run mage check

3b72cd3

Signed-off-by: constanca <[email protected]>

Get metrics 1 time for each namespace/id

4ae2064

Signed-off-by: constanca <[email protected]>

Add error comparison

05b750f

Signed-off-by: constanca <[email protected]>

Move retry to monitor service

317fd73

Signed-off-by: constanca <[email protected]>

Remove GetMetricDefinitions without retry

da78b61

Signed-off-by: constanca <[email protected]>

fix storage unit tests

22c42af

Signed-off-by: constanca <[email protected]>

fix monitor unit tests

e0e5e7e

Signed-off-by: constanca <[email protected]>

Remove unnecessary for cycle

0d673a8

Signed-off-by: constanca <[email protected]>

zmoog force-pushed the azure-too-many-requests branch from 3576ea1 to 0d673a8 Compare March 19, 2024 23:33

constanca-m merged commit 10ff992 into elastic:main Mar 20, 2024
39 of 44 checks passed

constanca-m deleted the azure-too-many-requests branch March 20, 2024 08:06

mergify bot mentioned this pull request Mar 20, 2024

[8.13](backport #38294) [Metricbeat][Azure] Azure monitor/storage: Fix 429 Too Many Requests error #38457

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Metricbeat][Azure] Azure monitor/storage: Fix 429 Too Many Requests error #38294

[Metricbeat][Azure] Azure monitor/storage: Fix 429 Too Many Requests error #38294

constanca-m commented Mar 13, 2024 •

edited

Loading

elasticmachine commented Mar 13, 2024 •

edited

Loading

Build stats

zmoog left a comment

constanca-m commented Mar 13, 2024 •

edited

Loading

constanca-m commented Mar 13, 2024 •

edited

Loading

zmoog commented Mar 13, 2024 •

edited

Loading

zmoog commented Mar 13, 2024

constanca-m commented Mar 14, 2024 •

edited

Loading

constanca-m commented Mar 14, 2024

zmoog commented Mar 14, 2024

zmoog commented Mar 14, 2024

zmoog commented Mar 14, 2024

constanca-m commented Mar 14, 2024

zmoog commented Mar 14, 2024

constanca-m commented Mar 14, 2024 •

edited

Loading

zmoog commented Mar 14, 2024

zmoog commented Mar 14, 2024

zmoog Mar 18, 2024

constanca-m Mar 18, 2024 •

edited

Loading

zmoog commented Mar 19, 2024

zmoog left a comment

elasticmachine commented Mar 19, 2024

elasticmachine commented Mar 19, 2024

elasticmachine commented Mar 19, 2024

elasticmachine commented Mar 19, 2024

elasticmachine commented Mar 19, 2024

elasticmachine commented Mar 19, 2024

elasticmachine commented Mar 19, 2024

elasticmachine commented Mar 19, 2024 •

edited

Loading

elasticmachine commented Mar 19, 2024 •

edited

Loading

elasticmachine commented Mar 20, 2024

[Metricbeat][Azure] Azure monitor/storage: Fix 429 Too Many Requests error #38294

[Metricbeat][Azure] Azure monitor/storage: Fix 429 Too Many Requests error #38294

Conversation

constanca-m commented Mar 13, 2024 • edited Loading

Proposed commit message

Checklist

How to test this PR locally

Related issues

elasticmachine commented Mar 13, 2024 • edited Loading

💚 Build Succeeded

Build stats

❕ Flaky test report

🤖 GitHub comments

zmoog left a comment

Choose a reason for hiding this comment

constanca-m commented Mar 13, 2024 • edited Loading

constanca-m commented Mar 13, 2024 • edited Loading

zmoog commented Mar 13, 2024 • edited Loading

zmoog commented Mar 13, 2024

constanca-m commented Mar 14, 2024 • edited Loading

constanca-m commented Mar 14, 2024

zmoog commented Mar 14, 2024

zmoog commented Mar 14, 2024

zmoog commented Mar 14, 2024

constanca-m commented Mar 14, 2024

zmoog commented Mar 14, 2024

constanca-m commented Mar 14, 2024 • edited Loading

zmoog commented Mar 14, 2024

zmoog commented Mar 14, 2024

zmoog Mar 18, 2024

Choose a reason for hiding this comment

constanca-m Mar 18, 2024 • edited Loading

Choose a reason for hiding this comment

zmoog commented Mar 19, 2024

zmoog left a comment

Choose a reason for hiding this comment

elasticmachine commented Mar 19, 2024

💚 Build Succeeded

History

elasticmachine commented Mar 19, 2024

💚 Build Succeeded

History

elasticmachine commented Mar 19, 2024

💚 Build Succeeded

History

elasticmachine commented Mar 19, 2024

💚 Build Succeeded

History

elasticmachine commented Mar 19, 2024

💚 Build Succeeded

History

elasticmachine commented Mar 19, 2024

💚 Build Succeeded

History

elasticmachine commented Mar 19, 2024

💚 Build Succeeded

History

elasticmachine commented Mar 19, 2024 • edited Loading

💔 Build Failed

Failed CI Steps

History

elasticmachine commented Mar 19, 2024 • edited Loading

💔 Build Failed

Failed CI Steps

History

elasticmachine commented Mar 20, 2024

💚 Build Succeeded

History

constanca-m commented Mar 13, 2024 •

edited

Loading

elasticmachine commented Mar 13, 2024 •

edited

Loading

constanca-m commented Mar 13, 2024 •

edited

Loading

constanca-m commented Mar 13, 2024 •

edited

Loading

zmoog commented Mar 13, 2024 •

edited

Loading

constanca-m commented Mar 14, 2024 •

edited

Loading

constanca-m commented Mar 14, 2024 •

edited

Loading

constanca-m Mar 18, 2024 •

edited

Loading

elasticmachine commented Mar 19, 2024 •

edited

Loading

elasticmachine commented Mar 19, 2024 •

edited

Loading