Skip to content

RunWhen Public CLI Codecollection Repository - Open Source CLI troubleshooting library for Kubernetes and cloud infrastructure components.

License

Notifications You must be signed in to change notification settings

taylorjstacey/rw-cli-codecollection

 
 

Repository files navigation

Troubleshooting Tasks in Codecollection: 128 Codebundles in Codecollection: 45

Join Discord
Join Slack

Open in GitHub Codespaces

RunWhen Public Codecollection

This repository is a codecollection that is to be used within the RunWhen platform. It contains codebundles that can be used in SLIs, and TaskSets.

Please see the contributing and code of conduct for details on adding your contributions to this project.

Documentation for each codebundle is maintained in the README.md alongside the robot code and is published at https://docs.runwhen.com/public/v/codebundles/. Please see the readme howto for details on crafting a codebundle readme that can be indexed.

Getting Started

Head on over to our centralized documentation here for detailed information on getting started.

File Structure overview of devcontainer:

-/app/
    |- auth/ # store secrets here, it should already be properly gitignored for you
    |- codecollection/
    |   |- codebundles/ # stores codebundles that can be run during development
    |   |- libraries/ # stores python keyword libraries used by codebundles
    |- dev_facade/ # provides interfaces equivalent to those used on the platform, but just dry runs the keywords to assist with development
    ...

The included script ro wraps the robot RobotFramework binary, and includes some extra functionality to write files to a consistent location for viewing in a HTTP server at http://localhost:3000/ that is always running as part of the devcontainer.

Quickstart

Navigate to the codebundle directory cd codecollection/codebundles/curl-http-ok/

Run the codebundle ro runbook.robot

Codebundle Index

Name Supported Integrations Tasks Documentation
AWS CloudWatch Overutlized EC2 Inspection AWS, CloudWatch Check For Overutilized Ec2 Instances Queries AWS CloudWatch for a list of EC2 instances with a high amount of resource utilization, raising issues when overutilized instances are found. Docs
AWS EKS Nodegroup Status Check AWS, EKS Check EKS Nodegroup Status Queries a node group within a EKS cluster to check if the nodegroup has degraded service, indicating ongoing reboots or other issues. Docs
Azure Internal LoadBalancer Triage Kubernetes, AKS, Azure Health Check Internal Azure Load Balancer Triages issues related to a Azure Loadbalancers and its activity logs. Docs
Azure Monitor Activity Log SLI Kubernetes, AKS, Azure Run Azure Monitor Activity Log Triage Measures the count of error activity log entries as a SLI metric for the Azure tenancy. Docs
Azure Monitor Event Triage Kubernetes, AKS, Azure Run Azure Monitor Activity Log Triage Triages issues related to a Azure Loadbalancers, Kubernetes ingress objects and services. Docs
GCP Gcloud Log Inspection GCP, Gcloud, Google Monitoring Inspect GCP Logs For Common Errors Fetches logs from a GCP using a configurable query and raises an issue with details on the most common issues. Docs
GCP Node Prempt List GCP, GKE Count the number of nodes in active prempt operation Check if any GCP nodes have an active preempt operation. Docs
GKE Kong Ingress Host Triage GCP, GMP, Ingress, Kong, Metrics Check If Kong Ingress HTTP Error Rate Violates HTTP Error Threshold, Check If Kong Ingress HTTP Request Latency Violates Threshold, Check If Kong Ingress Controller Reports Upstream Errors Collects Kong ingress host metrics from GMP on GCP and inspects the results for ingress with a HTTP error code rate greater than zero over a configurable duration and raises issues based on the number of ingress with error codes. Docs
GKE Nginx Ingress Host Triage GCP, GMP, Ingress, Nginx, Metrics Fetch Nginx HTTP Errors From GMP for Ingress ${INGRESS_OBJECT_NAME}, `Find Owner and Service Health for Ingress `${INGRESS_OBJECT_NAME} Collects Nginx ingress host controller metrics from GMP on GCP and inspects the results for ingress with a HTTP error code rate greater than zero over a configurable duration and raises issues based on the number of ingress with error codes. Docs
Kubeprometheus Operator Troubleshoot Kubernetes, AKS, EKS, GKE, OpenShift, Prometheus Check Prometheus Service Monitors, Check For Successful Rule Setup, Verify Prometheus RBAC Can Access ServiceMonitors, Identify Endpoint Scraping Errors, Check Prometheus API Healthy This taskset investigates the logs, state and health of Kubernetes Prometheus operator. Docs
Kubernetes Application Monitor Kubernetes, AKS, EKS, GKE, OpenShift Measure Application Exceptions Measures the number of exception stacktraces present in an application's logs over a time period. Docs
Kubernetes Application Troubleshoot Kubernetes, AKS, EKS, GKE, OpenShift Get Workload Logs, Scan For Misconfigured Environment, Troubleshoot Application Logs Triages issues related to a deployment and its replicas. Docs
Kubernetes ArgoCD Application Health & Troubleshoot Kubernetes, AKS, EKS, GKE, OpenShift, ArgoCD Fetch ArgoCD Application Sync Status & Health, Fetch ArgoCD Application Last Sync Operation Details, Fetch Unhealthy ArgoCD Application Resources, Scan For Errors in Pod Logs Related to ArgoCD Application Deployments, Fully Describe ArgoCD Application This taskset collects information and runs general troubleshooting checks against argocd application objects within a namespace. Docs
Kubernetes ArgoCD HelmRelease TaskSet Kubernetes, AKS, EKS, GKE, OpenShift, ArgoCD Fetch all available ArgoCD Helm releases, Fetch Installed ArgoCD Helm release versions This codebundle runs a series of tasks to identify potential helm release issues related to ArgoCD managed Helm objects. Docs
Kubernetes Artifactory Triage Kubernetes, AKS, EKS, GKE, OpenShift, Artifactory Check Artifactory Liveness and Readiness Endpoints Performs a triage on the Open Source version of Artifactory in a Kubernetes cluster. Docs
Kubernetes CertManager Healthcheck Kubernetes, AKS, EKS, GKE, OpenShift Get Health Score of CertManager Workloads Check the health of pods deployed by cert-manager. Docs
Kubernetes Daemonset Triage Kubernetes, AKS, EKS, GKE, OpenShift Get DaemonSet Log Details For Report, Get Related Daemonset Events, Check Daemonset Replicas Triages issues related to a Daemonset and its available replicas. Docs
Kubernetes Deployment Triage Kubernetes, AKS, EKS, GKE, OpenShift Check Deployment Log For Issues with ${DEPLOYMENT_NAME}, `Check Liveness Probe Configuration for Deployment `${DEPLOYMENT_NAME}, Check Readiness Probe Configuration for Deployment ${DEPLOYMENT_NAME}, `Troubleshoot Deployment Warning Events for `${DEPLOYMENT_NAME}, Get Deployment Workload Details For ${DEPLOYMENT_NAME} and Add to Report, Troubleshoot Deployment Replicas for ${DEPLOYMENT_NAME}, `Check Deployment Event Anomalies for `${DEPLOYMENT_NAME} Triages issues related to a deployment and its replicas. Docs
Kubernetes Flux Choas Testing Kubernetes, AKS, EKS, GKE, OpenShift Suspend the Flux Resource Reconciliation, Find Random FluxCD Workload as Chaos Target, Execute Chaos Command, Execute Additional Chaos Command, Resume Flux Resource Reconciliation This taskset is used to suspend a flux resource for the purposes of executing chaos tasks. Docs
Kubernetes FluxCD HelmRelease TaskSet Kubernetes, AKS, EKS, GKE, OpenShift, FluxCD List all available FluxCD Helmreleases, Fetch Installed FluxCD Helmrelease Versions, Fetch Mismatched FluxCD HelmRelease Version, Fetch FluxCD HelmRelease Error Messages, Check for Available Helm Chart Updates This codebundle runs a series of tasks to identify potential helm release issues related to Flux managed Helm objects. Docs
Kubernetes FluxCD Kustomization TaskSet Kubernetes, AKS, EKS, GKE, OpenShift, FluxCD List all available Kustomization objects, Get details for unready Kustomizations This codebundle runs a series of tasks to identify potential Kustomization issues related to Flux managed Kustomization objects. Docs
Kubernetes Grafana Loki Health Check k8s Check Loki Ring API, Check Loki API Ready This taskset checks the health of Grafana Loki and its hash ring. Docs
Kubernetes Image Check Kubernetes, AKS, EKS, GKE, OpenShift Check Image Rollover Times for Namespace ${NAMESPACE}, `List Images and Tags for Every Container in Running Pods for Namespace `${NAMESPACE}, List Images and Tags for Every Container in Failed Pods for Namespace${NAMESPACE}, `List ImagePullBackOff Events and Test Path and Tags for Namespace `${NAMESPACE} This taskset provides detailed information about the images used in a Kubernetes namespace. Docs
Kubernetes Ingress GCE & GCP HTTP Load Balancer Healthcheck Kubernetes, GKE, GCE, GCP Search For GCE Ingress Warnings in GKE, Identify Unhealthy GCE HTTP Ingress Backends, Validate GCP HTTP Load Balancer Configurations, Fetch Network Error Logs from GCP Operations Manager for Ingress Backends, Review GCP Operations Logging Dashboard Troubleshoot GCE Ingress Resources related to GCP HTTP Load Balancer in GKE Docs
Kubernetes Ingress Healthcheck Kubernetes, AKS, EKS, GKE, OpenShift Fetch Ingress Object Health in Namespace ${NAMESPACE}, `Check for Ingress and Service Conflicts in Namespace `${NAMESPACE} Triages issues related to a ingress objects and services. Docs
Kubernetes Jenkins Healthcheck Kubernetes, AKS, EKS, GKE, OpenShift, Jenkins Query The Jenkins Kubernetes Workload HTTP Endpoint, Query For Stuck Jenkins Jobs This taskset collects information about perstistent volumes and persistent volume claims to validate health or help troubleshoot potential issues. Docs
Kubernetes Labeled Pod Count Kubernetes, AKS, EKS, GKE, OpenShift Measure Number of Running Pods with Label This codebundle fetches the number of running pods with the set of provided labels, letting you measure the number of running pods. Docs
Kubernetes Namespace Healthcheck Kubernetes, AKS, EKS, GKE, OpenShift Get Event Count and Score, Get Container Restarts and Score, Get NotReady Pods, Generate Namspace Score This SLI uses kubectl to score namespace health. Produces a value between 0 (completely failing thet test) and 1 (fully passing the test). Looks for container restarts, events, and pods not ready. Docs
Kubernetes Namespace Troubleshoot Kubernetes, AKS, EKS, GKE, OpenShift Troubleshoot Warning Events in Namespace ${NAMESPACE}, `Troubleshoot Container Restarts In Namespace `${NAMESPACE}, Troubleshoot Pending Pods In Namespace ${NAMESPACE}, `Troubleshoot Failed Pods In Namespace `${NAMESPACE}, Troubleshoot Workload Status Conditions In Namespace ${NAMESPACE}, `Get Listing Of Resources In Namespace `${NAMESPACE}, Check Event Anomalies in Namespace ${NAMESPACE}, `Troubleshoot Services And Application Workloads in Namespace `${NAMESPACE}, Check Missing or Risky PodDisruptionBudget Policies in Namepace ${NAMESPACE}`` This taskset runs general troubleshooting checks against all applicable objects in a namespace. Looks for warning events, odd or frequent normal events, restarting containers and failed or pending pods. Docs
Kubernetes Persistent Volume Healthcheck Kubernetes, AKS, EKS, GKE, OpenShift Fetch Events for Unhealthy Kubernetes PersistentVolumeClaims in Namespace ${NAMESPACE}, `List PersistentVolumeClaims in Terminating State in Namespace `${NAMESPACE}, List PersistentVolumes in Terminating State in Namespace ${NAMESPACE}, `List Pods with Attached Volumes and Related PersistentVolume Details in Namespace `${NAMESPACE}, Fetch the Storage Utilization for PVC Mounts in Namespace ${NAMESPACE}, `Check for RWO Persistent Volume Node Attachment Issues in Namespace `${NAMESPACE} This taskset collects information about storage such as PersistentVolumes and PersistentVolumeClaims to validate health or help troubleshoot potential storage issues. Docs
Kubernetes Pod Resources Scan Kubernetes, AKS, EKS, GKE, OpenShift Show Pods Without Resource Limit or Resource Requests Set in Namespace ${NAMESPACE}, `Get Pod Resource Utilization with Top in Namespace `${NAMESPACE} Inspects the resources provisioned for a given set of pods, selected by their labels and raises issues if no resources were specified. Docs
Kubernetes Postgres Triage AKS, EKS, GKE, Kubernetes, Patroni, Postgres Get Standard Postgres Resource Information, Describe Postgres Custom Resources, Get Postgres Pod Logs & Events, Get Postgres Pod Resource Utilization, Get Running Postgres Configuration, Get Patroni Output, Run DB Queries Runs multiple Kubernetes and psql commands to report on the health of a postgres cluster. Docs
Kubernetes Redis Healthcheck Kubernetes, AKS, EKS, GKE, OpenShift, Redis Ping ${DEPLOYMENT_NAME} Redis Workload, Verify ${DEPLOYMENT_NAME} Redis Read Write Operation This taskset collects information on your redis workload in your Kubernetes cluster and raises issues if any health checks fail. Docs
Kubernetes Restart resource Kubernetes, AKS, EKS, GKE, OpenShift Get Current Resource State, Get Resource Logs, Restart Resource This taskset restarts a resource with a given set of labels, typically used with other tasksets. Docs
Kubernetes Service Account Check Kubernetes, AKS, EKS, GKE, OpenShift, Redis Test Service Account Access to Kubernetes API Server This taskset provides tasks to troubleshoot service accounts in a Kubernetes namespace. Docs
Kubernetes StatefulSet Triage Kubernetes, AKS, EKS, GKE, OpenShift Fetch StatefulSet ${STATEFULSET_NAME} Logs, Get Related StatefulSet ${STATEFULSET_NAME} Events, Fetch StatefulSet ${STATEFULSET_NAME} Manifest Details, List StatefulSets with Unhealthy Replica Counts In Namespace ${NAMESPACE}`` Triages issues related to a StatefulSet and its replicas. Docs
Kubernetes Vault Triage AKS, EKS, GKE, Kubernetes, Vault Fetch Vault CSI Driver Logs, Get Vault CSI Driver Warning Events, Check Vault CSI Driver Replicas, Fetch Vault Logs, Get Related Vault Events, Fetch Vault StatefulSet Manifest Details, Fetch Vault DaemonSet Manifest Details, Verify Vault Availability, Check Vault StatefulSet Replicas A suite of tasks that can be used to triage potential issues in your vault namespace. Docs
Terraform Cloud Workspace Lock Check Terraform Cloud Checking whether the Terraform Cloud Workspace is in a locked state Check whether the Terraform Cloud Workspace is in a locked state. Docs
Test Issues Test Raise Full Issue A codebundle for testing the issues feature. Purely for testing flow. Docs
cURL HTTP OK Linux macOS Windows HTTP Checking HTTP URL Is Available And Timely This taskset uses curl to validate the response code of the endpoint. Returns ascore of 1 if healthy, an 0 if unhealthy. Docs
cli-test-taskset cli Run CLI and Parse Output For Issues, Exec Test, Local Process Test This taskset smoketests the CLI codebundle setup and run process Docs
cmd-test-taskset cmd Run CLI Command, Run Bash File, Log Suggestion This taskset smoketests the CLI codebundle setup and run process by running a bare command Docs

About

RunWhen Public CLI Codecollection Repository - Open Source CLI troubleshooting library for Kubernetes and cloud infrastructure components.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • RobotFramework 71.6%
  • Python 17.8%
  • Shell 10.5%
  • Dockerfile 0.1%