Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create DiscoverEC2 User Tasks when Auto Discover fails on EC2 instances #47064

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

marcoandredinis
Copy link
Contributor

@marcoandredinis marcoandredinis commented Oct 1, 2024

This PR changes the DiscoveryService to start creating and updating
Discover EC2 User Tasks.

So, what are Discover EC2 User Tasks?
When users set up Auto Discover for EC2 Instances, they don't have a
good way of checking for issues on their configured matchers.

We created User Tasks as a way to warn Users that something's wrong.
Each User Task should describe an issue that happened and a way to fix
it.
This has potential to be used to report unexpected events trough the
whole system, which are not errors per se, but something the user should
take action in order to improve the situation.
In this case, we are creating a sub type of those tasks: DiscoverEC2.

From now on, when the DiscoveryService fails to auto-enroll an instance,
it will create a DiscoverEC2 User Task grouping all the failed instances
by the following props:

  • integration
  • issue type
  • account id
  • region

A follow up PR will also create notifications so that the user can
actually be notified on those User Tasks and take action.

Demo:

$ tctl get user_tasks
kind: user_task
metadata:
  expires:
    nanos: 542005000
    seconds: 1727976384
  name: 52c33845-a413-5b2e-bfa3-1afc5d79236a
  revision: 663ed34b-6c65-4976-9052-af5898b295e1
spec:
  discover_ec2:
    account_id: "123456789012"
    instances:
      i-123:
        discovery_config: dc001
        discovery_group: aws-prod
        instance_id: i-123
        invocation_url: https://eu-west-2.console.aws.amazon.com/systems-manager/run-command/5041ad35-deab-4be7-8bfb-cd64f1cc2a24/i-123
        sync_time:
          nanos: 525000000
          seconds: 1727976024
    region: eu-west-2
  integration: teleportdev
  issue_type: ec2-ssm-script-failure
  state: OPEN
  task_type: discover-ec2
version: v1
---
kind: user_task
metadata:
  expires:
    nanos: 540832000
    seconds: 1727976378
  name: 7da201b0-57d1-503b-856e-de8ca5acd1e1
  revision: 9f803bb5-7ef6-4bd6-a2e5-fbbf8119248a
spec:
  discover_ec2:
    account_id: "123456789012"
    instances:
      i-1234:
        discovery_config: dc001
        discovery_group: aws-prod
        instance_id: i-1234
        sync_time:
          nanos: 379000000
          seconds: 1727976018
    region: eu-west-2
  integration: teleportdev
  issue_type: ec2-ssm-unsupported-os
  state: OPEN
  task_type: discover-ec2
version: v1
---
kind: user_task
metadata:
  expires:
    nanos: 782998000
    seconds: 1727976378
  name: b64397ac-764d-5d9d-af09-30d4775f7a78
  revision: 30ac5a71-a314-43de-a190-ee7db6cfa36a
spec:
  discover_ec2:
    account_id: "123456789012"
    instances:
      i-12345:
        discovery_config: dc001
        discovery_group: aws-prod
        instance_id: i-12345
        sync_time:
          nanos: 631000000
          seconds: 1727976018
    region: eu-west-2
  integration: teleportdev
  issue_type: ec2-ssm-agent-connection-lost
  state: OPEN
  task_type: discover-ec2
version: v1
---
kind: user_task
metadata:
  expires:
    nanos: 291505000
    seconds: 1727976378
  name: dcdca1c5-d489-58bd-8f01-6477ad9756d1
  revision: 176e6527-5cf7-43a2-b1b5-39cdff720f96
spec:
  discover_ec2:
    account_id: "123456789012"
    instances:
      i-123456:
        discovery_config: dc001
        discovery_group: aws-prod
        instance_id: i-123456
        sync_time:
          nanos: 887000000
          seconds: 1727976017
      i-1234567:
        discovery_config: dc001
        discovery_group: aws-prod
        instance_id: i-1234567
        sync_time:
          nanos: 134000000
          seconds: 1727976018
    region: eu-west-2
  integration: teleportdev
  issue_type: ec2-ssm-agent-not-registered
  state: OPEN
  task_type: discover-ec2
version: v1

@marcoandredinis marcoandredinis added no-changelog Indicates that a PR does not require a changelog entry backport/branch/v16 labels Oct 1, 2024
@marcoandredinis marcoandredinis force-pushed the marco/discovery_emit_discoverec2_tasks branch from ccff419 to 7567361 Compare October 2, 2024 08:08
@marcoandredinis marcoandredinis force-pushed the marco/discovery_emit_discoverec2_tasks branch from 7567361 to 0c450ad Compare October 2, 2024 13:42
@marcoandredinis marcoandredinis force-pushed the marco/discovery_emit_discoverec2_tasks branch 2 times, most recently from 17cf648 to c10e937 Compare October 3, 2024 17:22
@marcoandredinis marcoandredinis marked this pull request as ready for review October 3, 2024 17:24
@marcoandredinis
Copy link
Contributor Author

(i'm working on the flaky test but this should be reviewable already 👍 )

@marcoandredinis marcoandredinis force-pushed the marco/discovery_emit_discoverec2_tasks branch 2 times, most recently from bdb1723 to 3eb2db8 Compare October 4, 2024 10:30
@marcoandredinis marcoandredinis force-pushed the marco/discovery_emit_discoverec2_tasks branch from 3eb2db8 to a5b14d9 Compare October 4, 2024 10:45
Base automatically changed from marco/discover_ec2_tasks to master October 4, 2024 11:22
@marcoandredinis marcoandredinis force-pushed the marco/discovery_emit_discoverec2_tasks branch from a5b14d9 to c1e1f45 Compare October 4, 2024 11:32
This PR changes the DiscoveryService to start creating and updating
Discover EC2 User Tasks.

So, what are Discover EC2 User Tasks?
When users set up Auto Discover for EC2 Instances, they don't have a
good way of checking for issues on their configured matchers.

We created User Tasks as a way to warn Users that something's wrong.
Each User Task should describe an issue that happened and a way to fix
it.
This has potential to be used to report unexpected events trough the
whole system, which are not errors per se, but something the user should
take action in order to improve the situation.
In this case, we are creating a sub type of those tasks: DiscoverEC2.

From now on, when the DiscoveryService fails to auto-enroll an instance,
it will create a DiscoverEC2 User Task grouping all the failed instances
by the following props:
- integration
- issue type
- account id
- region

A follow up PR will also create notifications so that the user can
actually be notified on those User Tasks and take action.
@marcoandredinis marcoandredinis force-pushed the marco/discovery_emit_discoverec2_tasks branch from c1e1f45 to 6c641a2 Compare October 4, 2024 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/branch/v16 discovery no-changelog Indicates that a PR does not require a changelog entry size/md
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant