Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty t3.xlarge spot instance not getting deleted #1690

Open
indra0007 opened this issue Sep 19, 2024 · 2 comments
Open

Empty t3.xlarge spot instance not getting deleted #1690

indra0007 opened this issue Sep 19, 2024 · 2 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@indra0007
Copy link

indra0007 commented Sep 19, 2024

Description

Observed Behavior:
Empty t3.xlarge spot instance not getting deleted

Expected Behavior:
Empty Nodes should be deleted

Reproduction Steps (Please include YAML):

  1. Install below nodepool
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: noncriticalinfra
spec:
  disruption:
    budgets:
      - duration: 164h
        nodes: '0'
        reasons:
          - Underutilized
          - Drifted
        schedule: 0 4 * * 0
    consolidateAfter: 1m
    consolidationPolicy: WhenEmptyOrUnderutilized
  template:
    metadata:
      labels:
        service-layer/role: noncriticalinfra
    spec:
      expireAfter: Never
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - amd64
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand
            - spot
        - key: karpenter.k8s.aws/instance-category
          operator: In
          values:
            - c
            - m
            - r
            - t
        - key: karpenter.k8s.aws/instance-generation
          operator: Gt
          values:
            - '2'
  1. Install below deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  labels:
    app: nginx
spec:
  replicas: 1 
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      nodeSelector:
        service-layer/role: noncriticalinfra
      containers:
      - name: nginx
        image: nginx:latest  
        ports:
        - containerPort: 80 
        resources:
          requests:
            cpu: "64m"
            memory: 64Mi
  1. Install below deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx2
  labels:
    app: nginx2
spec:
  replicas: 3  
  selector:
    matchLabels:
      app: nginx2
  template:
    metadata:
      labels:
        app: nginx2
    spec:
      nodeSelector:
        service-layer/role: noncriticalinfra
      containers:
      - name: nginx
        image: nginx:latest  
        ports:
        - containerPort: 80  
        resources:
          requests:
            cpu: "512m"
            memory: 256Mi

Once u performed all above 3 steps, 2 nodes will be created by karpenter (m3.medium and t3.xlarge)

  1. Delete both the deployments
kubectl delete deploy nginx;
kubectl delete deploy nginx2;

Once both the deployments get deleted then ideally both the nodes should also get deleted one by one as both are empty. But surprisingly t3.medium instance gets immediately deleted as expected but not the t3.xlarge one. It's very surprising that those are behaving differently as those are part of same nodepool and should behave exact same way.

No logs emitted from karpenter whatsoever. As if karpenter knows nothing about that t3.xlarge node. No relevant events even from that nodeclaim. Below are the set of events from that nodeclaim

Type    Reason             Age   From       Message
 ----    ------             ----  ----       -------
 Normal  Launched           31m   karpenter  Status condition transitioned, Type: Launched, Status: Unknown -> True, Reason: Launched
 Normal  DisruptionBlocked  31m   karpenter  Cannot disrupt NodeClaim: state node doesn't contain both a node and a nodeclaim
 Normal  Registered         30m   karpenter  Status condition transitioned, Type: Registered, Status: Unknown -> True, Reason: Registered
 Normal  Initialized        30m   karpenter  Status condition transitioned, Type: Initialized, Status: Unknown -> True, Reason: Initialized
 Normal  Ready              30m   karpenter  Status condition transitioned, Type: Ready, Status: Unknown -> True, Reason: Ready

Please let me know what's going on here.

Versions:
Image version: 1.0.0@sha256:dd095cdcf857c3812f2084a7b20294932f461b0bff912acf58d592faa032fbef

  • Chart Version: 1.0.2
  • Kubernetes Version (kubectl version): 1.29
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@indra0007 indra0007 added the kind/bug Categorizes issue or PR as related to a bug. label Sep 19, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 19, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@indra0007
Copy link
Author

indra0007 commented Sep 19, 2024

It might be related to aws/karpenter-provider-aws#6593. Also FYI the said node has nothing but some daemonset pods so I can consider that to be empty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

2 participants