Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metricbeat not collecting tags for AWS - Apigateway resource_type:restapis #33913

Open
laraMorenoIgle opened this issue Dec 1, 2022 · 13 comments · Fixed by #40755
Open

Metricbeat not collecting tags for AWS - Apigateway resource_type:restapis #33913

laraMorenoIgle opened this issue Dec 1, 2022 · 13 comments · Fixed by #40755
Assignees
Labels
Team:Cloud-Monitoring Label for the Cloud Monitoring team

Comments

@laraMorenoIgle
Copy link

laraMorenoIgle commented Dec 1, 2022

Bug description:
With the below configuration to collect metrics from AWS/Apigateway and include the tags for the restapis service.

  metrics:
    - namespace: "AWS/ApiGateway"
      resource_type: apigateway:restapis

However, It was detected that the tags are not being collected for such resource type.

Version: 8.4.3
Steps to Reproduce:

  • we collect all the tags for a given resource type
  • we obtain a list of resource ARNs of the given type
  • from the above list we extract from the ARNs the resource information and collect them with their related tags
  • we then collect the metrics for the namespace
  • we obtain a list of metrics with their dimensions, that usually include the resource information part of the ARN
  • we match the resource information part of the ARN in the metrics list with the one from the collection of tags

This proofed to work for EC2, RDS etc and in general all the predefined module we have in metricbeat, but does not seem to work for restapis: that's because the information in the metrics dimensions and the information in the tag list, related to the resource, it's different

Here's an example from EC2, see the matching of the two values highlighted:
image

Here's the same from restapi, see how the subIdentifier (the information coming from metrics dimentions) is the name of the rest api, but the resource information in the tags collection coming from the ARN does not match:
image

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Dec 1, 2022
@aspacca aspacca added the Team:Cloud-Monitoring Label for the Cloud Monitoring team label Dec 1, 2022
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Dec 1, 2022
@endorama endorama self-assigned this Feb 16, 2023
@endorama
Copy link
Member

I've investigated this a bit more.

Before delving into the logic of this, I think there is a workaround: configure tags in Metricbeat configuration file to desired values. The trade off is that tags are applied to all events, so dedicated metricbeats must be launched for gathering these data and they will be hardcoded instead of being collected from the real infrastructure.

Now to the logic of how this work:

  1. resource_type configuration value is passed to GetResourcesTags by this line
    resourceTagMap, err := aws.GetResourcesTags(svcResourceAPI, []string{resourceType})
  2. GetResourcesTags function in located in
    func GetResourcesTags(svc resourcegroupstaggingapi.GetResourcesAPIClient, resourceTypeFilters []string) (map[string][]resourcegroupstaggingapitypes.Tag, error) {
  3. Within this function a list of resources with specified resource type filters are requested to AWS APIs using GetResourcesInput - AWS SDK docs.
    From the Go doc:
    // Specifies the resource types that you want included in the response. The format
    // of each resource type is service[:resourceType]. [...]
    // The string for each service name and resource type is the same as
    // that embedded in a resource's Amazon Resource Name (ARN).
    
    Note: make sure that the resource type is correct, as a typo there would try to collect the wrong tags. (is not the case of this issue as the configuration is correct)
    Once tags are collected, they are mapped to the identifier linked to them, which is extracted from resource ARN contained in the Tag response from AWS APIs.
  4. The collected tag map is then iterated to apply specified tags filters in
    for identifier, tags := range resourceTagMap {
    if exists := aws.CheckTagFiltersExist(tagsFilter, tags); !exists {
    m.logger.Debugf("In region %s, service %s tags does not match tags_filter", regionName, identifier)
    delete(resourceTagMap, identifier)
    continue
    }

    There are no filters in this configuration, so this does not apply here; this is to highlight that the filtering logic takes precedence
  5. Tags are then applied: sub-identifiers are extracted, tag filter is checked (should not apply here as there are no filters), event is initialised and action is delegated to insertTags
    subIdentifiers := strings.Split(identifierValue, dimensionSeparator)
    for _, subIdentifier := range subIdentifiers {
    if _, ok := events[uniqueIdentifierValue]; !ok {
    // when tagsFilter is not empty but no entry in
    // resourceTagMap for this identifier, do not initialize
    // an event for this identifier.
    if len(tagsFilter) != 0 && resourceTagMap[subIdentifier] == nil {
    continue
    }
    events[uniqueIdentifierValue] = aws.InitEvent(regionName, m.AccountName, m.AccountID, output.Timestamps[valI])
    }
    events[uniqueIdentifierValue] = insertRootFields(events[uniqueIdentifierValue], metricDataResultValue, labels)
    insertTags(events, uniqueIdentifierValue, subIdentifier, resourceTagMap)
    }
  6. insertTags:
    1. extract tags related to sub-identifier
    2. if empty and the sub-identifier starts with arn, try extracting the short identifier from it, then lookup tags again
    3. add tags to event if there are some (with some tag key transformation)

From the pictures above, is clear there is no correlation between the event identifier (and extracted sub-identifiers) and the tag identifier as extracted by GetResourcesTags, so tags are not matched to the correct resources.

I still have to explore how the metric identifier generation works and if we can solve this by changing the metric identifier instead of the tag identifier.

@endorama
Copy link
Member

endorama commented Feb 22, 2023

Updates on my investigation. TLDR: this is not possible to fix in its entirety but there are a couple of workarounds worth exploring that would help mitigating the problem.

Why it can't be fixed?

As mentioned in the previous comments, there is no data to match the metric data and the tags.

The identifier used by the metrics comes from metric labels:

identifierValue := labels[identifierValueIdx]

labels := strings.Split(*output.Label, labelSeparator)

for _, output := range metricDataResults {

metricDataResults, err := aws.GetMetricDataResults(metricDataQueries, svcCloudwatch, startTime, endTime)

GetMetricsDataResults returns ([]types.MetricDataResult, error) (source): Id does not contain identifiable information; Label does and is in this format: "5XXError|AWS/ApiGateway|Average|ApiName|PetStore"

API Gateway API name is used, no other identifiable information is available.

On the other hand, tags are collected separately. When they are matched with the incoming metrics, this is an example situation:

  • resourceTagMap is a map[string][]github.com/aws/aws-sdk-go-v2/service/resourcegroupstaggingapi/types.Tag (source) containing:
        "i747znhp4g": [(0xc00108cd20), (0xc00108cd38),],
        "/restapis/i747znhp4g": [(0xc00108cd20), (0xc00108cd38),],
        "v7o6rbtm6i": [(0xc00061f560),(0xc00061f578),(0xc00061f590),(0xc00061f5a8),],
        "/restapis/v7o6rbtm6i": [(0xc00061f560),(0xc00061f578),(0xc00061f590),(0xc00061f5a8),],

i747znhp4g and v7o6rbtm6i are API Gateway IDs. The ID and the /restapis/ID keys contain the same tags.

  • labels is a []string containing:
["4XXError","AWS/ApiGateway","Average","ApiName","PetStore",]

The fourth index is used as metric identifier, resulting in PetStore

Nowhere in the logic the code knows the ID of the API Gateway, thus we can conclude that there is no way to match the API Gateway metric with the appropriate labels.

To workaround this it should be possible to match API Gateways names with Ids through apigateway#Client.GetRestApis: it returns a apigateway#GetRestApisOutput containing a slice of apigateway/types#RestApi, containing API Gateway unique ID and name.

This would work but API Gateways can have the duplicated names (as uniqueness is guaranteed from the ID).

Thus univocally matching an API Gateway metric to a tag is not possible.

Solution

AWS APIs response contains the Rest API ID.

Workaround 1

When namespace is AWS/APIGateway and resource_type is restapis we use GetRestApis to find the API Gateway Id from name and use that to match metrics.

Pitfall: does not work when there are multiple API Gateway with the same name.
Trade off: if this happen, tag collection will be skipped and a message will be logged by metricbeat; documentation would document this use case mentioning this limitation.

Workaround 2

We request customers to apply a tag to the API Gateway resource. The tag must have an unique and known Key, which we use to find the appropriate tags in the resourceTagMap.

Pitfall: is unpredictable if there are multiple API Gateway with the same name.
Trade off: this can't be solved by the code itself, as using tags would prevent it from knowing that is in the "multiple resource with same tag" case; documentation could help; this may (but it should be investigated more in details) potentially resolve the pitfall of Workaround 1 but in a more complex way.

@endorama endorama changed the title Metricbeat not collecting tags for AWS - Apigateway resource_type Metricbeat not collecting tags for AWS - Apigateway resource_type:restapis Feb 28, 2023
@tommyers-elastic
Copy link
Contributor

hey thanks edo and andrea for the investigations into this so far.

from my point of view, neither of the workarounds presented here really offer a great solution for the user. i think for the time being we should document this as a known limitation of the integration and we should follow up with AWS to see if we can find a solution to getting the restapi ID, instead of the name.

let's park any further work on this for now.

@PBoff

This comment was marked as off-topic.

@RobDTech
Copy link

Agreed, the workarounds don't work for us either. Would be great to hear how you get on with AWS.

@gizas
Copy link
Contributor

gizas commented Sep 10, 2024

After analysis done in https://github.com/elastic/sdh-beats/issues/5103#issuecomment-2333853534, we had made the decision to enhance the currentl cloudwatch module with aws apigateway get-rest-apis command that will retrieve the and correlate the names and the ids of the provided apigw.

The idea is to check the namespace and if is namespace: "AWS/ApiGateway", then trigger the aws apigateway get-rest-apis call to retrieve the data needed.

The working branch: https://github.com/elastic/beats/tree/awscloudwatchtags

@axw
Copy link
Member

axw commented Oct 4, 2024

Reopening as #40755 has been reverted. CI started failing after that was merged; it doesn't look likely that it's specific to the change, but all the same Beats CI needed to be stabilised.

@axw axw reopened this Oct 4, 2024
@damianpfister
Copy link
Contributor

@axw - any suggestions on where the problem might be here, that required the revert? Just keen to get an idea whether the merge will be applied again in the near-term or rather requires deeper investigation.

@axw
Copy link
Member

axw commented Oct 16, 2024

@damianpfister as far as I know, investigation is ongoing (#41087 (comment)). @gizas is that right?

@gizas
Copy link
Contributor

gizas commented Oct 16, 2024

Indeed investigation is ongoing and I dont have any new update. I will try to see if I can find anything

@gizas
Copy link
Contributor

gizas commented Oct 17, 2024

@damianpfister still investigation is ongoing: #41270

@rdner
Copy link
Member

rdner commented Oct 22, 2024

@damianpfister @gizas the investigation is over and the linker problem was solved, see #41270 (comment)

#40755 can be re-applied again.

@gizas
Copy link
Contributor

gizas commented Oct 23, 2024

Thank you @rdner ! Opened #41388

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloud-Monitoring Label for the Cloud Monitoring team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants