Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reformat tables. #3048

Merged
merged 1 commit into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 11 additions & 11 deletions site/content/en/docs/adopters/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,16 @@ If you are using Kueue, feel free to open a pull request to add your organizatio

## Adopters

| Organization | Type | Description | Integrations | Contact |
|:----------------------------------------------------:|:--------:|:----------------------:|:----------------------------------:|:----------------------------------------:|
| [CyberAgent, Inc.](https://www.cyberagent.co.jp/en/) | End User | On-premise ML Platform | batch/job </br> kubeflow.org/mpijob | [@tenzen-y](https://github.com/tenzen-y) |
| [DaoCloud, Inc.](https://www.daocloud.io/en/) | End User | Part of the AI Platform for managing all kinds of Jobs. | batch/job </br> RayJob </br> ... | [@kerthcet](https://github.com/kerthcet) |
| [WattIQ, Inc.](https://wattiq.io) | End User | SaaS/IoT product | batch/job </br> RayJob </br> | [@madsenwattiq](https://github.com/madsenwattiq) |
| [Horizon, Inc.](https://horizon.cc/) | End User | AI training platform | batch/job </br> ... | [@GhangZh](https://github.com/GhangZh) |
| [FAR AI](https://far.ai/) | End User | AI alignment research nonprofit | batch/job | [@rhaps0dy](https://github.com/rhaps0dy) |
| [Shopee, Inc.](https://shopee.com/) | End User | Training/batch inference/data processes in AI platform test env | Customized job </br> RayJob </br> ... | [@denkensk](https://github.com/denkensk) |
| [Mondoo, Inc.](https://mondoo.com) | End User | Helps power Mondoo's hosted security scanner | batch/job | [@jaym](https://github.com/jaym) |
| [Google Cloud](https://cloud.google.com/) | Provider | Part of [kit for training ML workloads on TPUs][gcmldemo] | JobSet | [@mrozacki](https://github.com/mrozacki) |
| [Onna Technologies, Inc](https://onna.com) | End User | Unstructured Data Management Platform | batch/job </br> | [@gitcarbs](https://github.com/gitcarbs) |
| Organization | Type | Description | Integrations | Contact |
|:-------------------------------------------------------:|:--------:|:---------------------------------------------------------------:|:-------------------------------------:|:------------------------------------------------:|
| [CyberAgent, Inc.](https://www.cyberagent.co.jp/en/) | End User | On-premise ML Platform | batch/job </br> kubeflow.org/mpijob | [@tenzen-y](https://github.com/tenzen-y) |
| [DaoCloud, Inc.](https://www.daocloud.io/en/) | End User | Part of the AI Platform for managing all kinds of Jobs. | batch/job </br> RayJob </br> ... | [@kerthcet](https://github.com/kerthcet) |
| [WattIQ, Inc.](https://wattiq.io) | End User | SaaS/IoT product | batch/job </br> RayJob </br> | [@madsenwattiq](https://github.com/madsenwattiq) |
| [Horizon, Inc.](https://horizon.cc/) | End User | AI training platform | batch/job </br> ... | [@GhangZh](https://github.com/GhangZh) |
| [FAR AI](https://far.ai/) | End User | AI alignment research nonprofit | batch/job | [@rhaps0dy](https://github.com/rhaps0dy) |
| [Shopee, Inc.](https://shopee.com/) | End User | Training/batch inference/data processes in AI platform test env | Customized job </br> RayJob </br> ... | [@denkensk](https://github.com/denkensk) |
| [Mondoo, Inc.](https://mondoo.com) | End User | Helps power Mondoo's hosted security scanner | batch/job | [@jaym](https://github.com/jaym) |
| [Google Cloud](https://cloud.google.com/) | Provider | Part of [kit for training ML workloads on TPUs][gcmldemo] | JobSet | [@mrozacki](https://github.com/mrozacki) |
| [Onna Technologies, Inc](https://onna.com) | End User | Unstructured Data Management Platform | batch/job </br> | [@gitcarbs](https://github.com/gitcarbs) |

[gcmldemo]: https://cloud.google.com/blog/products/compute/the-worlds-largest-distributed-llm-training-job-on-tpu-v5e
32 changes: 16 additions & 16 deletions site/content/en/docs/installation/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -243,22 +243,22 @@ spec:

The currently supported features are:

| Feature | Default | Stage | Since | Until |
|---------|---------|-------|-------|-------|
| `FlavorFungibility` | `true` | Beta | 0.5 | |
| `MultiKueue` | `false` | Alpha | 0.6 | |
| `MultiKueueBatchJobWithManagedBy` | `false` | Alpha | 0.8 | |
| `PartialAdmission` | `false` | Alpha | 0.4 | 0.4 |
| `PartialAdmission` | `true` | Beta | 0.5 | |
| `ProvisioningACC` | `false` | Alpha | 0.5 | 0.6 |
| `ProvisioningACC` | `true` | Beta | 0.7 | |
| `QueueVisibility` | `false` | Alpha | 0.5 | |
| `VisibilityOnDemand` | `false` | Alpha | 0.6 | |
| `PrioritySortingWithinCohort` | `true` | Beta | 0.6 | |
| `LendingLimit` | `false` | Alpha | 0.6 | 0.8 |
| `LendingLimit` | `true` | Beta | 0.9 | |
| `MultiplePreemptions` | `false` | Alpha | 0.8 | 0.8 |
| `MultiplePreemptions` | `true` | Beta | 0.9 | |
| Feature | Default | Stage | Since | Until |
|-----------------------------------|---------|-------|-------|-------|
| `FlavorFungibility` | `true` | Beta | 0.5 | |
| `MultiKueue` | `false` | Alpha | 0.6 | |
| `MultiKueueBatchJobWithManagedBy` | `false` | Alpha | 0.8 | |
| `PartialAdmission` | `false` | Alpha | 0.4 | 0.4 |
| `PartialAdmission` | `true` | Beta | 0.5 | |
| `ProvisioningACC` | `false` | Alpha | 0.5 | 0.6 |
| `ProvisioningACC` | `true` | Beta | 0.7 | |
| `QueueVisibility` | `false` | Alpha | 0.5 | |
| `VisibilityOnDemand` | `false` | Alpha | 0.6 | |
| `PrioritySortingWithinCohort` | `true` | Beta | 0.6 | |
| `LendingLimit` | `false` | Alpha | 0.6 | 0.8 |
| `LendingLimit` | `true` | Beta | 0.9 | |
| `MultiplePreemptions` | `false` | Alpha | 0.8 | 0.8 |
| `MultiplePreemptions` | `true` | Beta | 0.9 | |

## What's next

Expand Down
42 changes: 21 additions & 21 deletions site/content/en/docs/reference/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,34 +13,34 @@ of the system and the status of [ClusterQueues](/docs/concepts/cluster_queue).

Use the following metrics to monitor the health of the kueue controllers:

| Metric name | Type | Description | Labels |
| ----------- | ---- | ----------- | ------ |
| `kueue_admission_attempts_total` | Counter | The total number of attempts to [admit](/docs/concepts#admission) workloads. Each admission attempt might try to admit more than one workload. | `result`: possible values are `success` or `inadmissible` |
| `kueue_admission_attempt_duration_seconds` | Histogram | The latency of an admission attempt. | `result`: possible values are `success` or `inadmissible` |
| Metric name | Type | Description | Labels |
|--------------------------------------------|-----------|------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------|
| `kueue_admission_attempts_total` | Counter | The total number of attempts to [admit](/docs/concepts#admission) workloads. Each admission attempt might try to admit more than one workload. | `result`: possible values are `success` or `inadmissible` |
| `kueue_admission_attempt_duration_seconds` | Histogram | The latency of an admission attempt. | `result`: possible values are `success` or `inadmissible` |

## ClusterQueue status

Use the following metrics to monitor the status of your ClusterQueues:

| Metric name | Type | Description | Labels |
| ----------- | ---- | ----------- | ------ |
| `kueue_pending_workloads` | Gauge | The number of pending workloads. | `cluster_queue`: the name of the ClusterQueue<br> `status`: possible values are `active` or `inadmissible` |
| `kueue_quota_reserved_workloads_total` | Counter | The total number of quota reserved workloads. | `cluster_queue`: the name of the ClusterQueue |
| `kueue_quota_reserved_wait_time_seconds` | Histogram | The time between a workload was created or requeued until it got quota reservation. | `cluster_queue`: the name of the ClusterQueue |
| `kueue_admitted_workloads_total` | Counter | The total number of admitted workloads. | `cluster_queue`: the name of the ClusterQueue |
| `kueue_evicted_workloads_total` | Counter | The total number of evicted workloads. | `cluster_queue`: the name of the ClusterQueue<br> `reason`: Possible values are `Preempted`, `PodsReadyTimeout`, `AdmissionCheck`, `ClusterQueueStopped` or `InactiveWorkload` |
| `kueue_admission_wait_time_seconds` | Histogram | The time between a workload was created or requeued until admission. | `cluster_queue`: the name of the ClusterQueue |
| `kueue_admission_checks_wait_time_seconds` | Histogram | The time from when a workload got the quota reservation until admission. | `cluster_queue`: the name of the ClusterQueue |
| `kueue_admitted_active_workloads` | Gauge | The number of admitted Workloads that are active (unsuspended and not finished) | `cluster_queue`: the name of the ClusterQueue |
| `kueue_cluster_queue_status` | Gauge | Reports the status of the ClusterQueue | `cluster_queue`: The name of the ClusterQueue<br> `status`: Possible values are `pending`, `active` or `terminated`. For a ClusterQueue, the metric only reports a value of 1 for one of the statuses. |
| Metric name | Type | Description | Labels |
|--------------------------------------------|-----------|-------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `kueue_pending_workloads` | Gauge | The number of pending workloads. | `cluster_queue`: the name of the ClusterQueue<br> `status`: possible values are `active` or `inadmissible` |
| `kueue_quota_reserved_workloads_total` | Counter | The total number of quota reserved workloads. | `cluster_queue`: the name of the ClusterQueue |
| `kueue_quota_reserved_wait_time_seconds` | Histogram | The time between a workload was created or requeued until it got quota reservation. | `cluster_queue`: the name of the ClusterQueue |
| `kueue_admitted_workloads_total` | Counter | The total number of admitted workloads. | `cluster_queue`: the name of the ClusterQueue |
| `kueue_evicted_workloads_total` | Counter | The total number of evicted workloads. | `cluster_queue`: the name of the ClusterQueue<br> `reason`: Possible values are `Preempted`, `PodsReadyTimeout`, `AdmissionCheck`, `ClusterQueueStopped` or `InactiveWorkload` |
| `kueue_admission_wait_time_seconds` | Histogram | The time between a workload was created or requeued until admission. | `cluster_queue`: the name of the ClusterQueue |
| `kueue_admission_checks_wait_time_seconds` | Histogram | The time from when a workload got the quota reservation until admission. | `cluster_queue`: the name of the ClusterQueue |
| `kueue_admitted_active_workloads` | Gauge | The number of admitted Workloads that are active (unsuspended and not finished) | `cluster_queue`: the name of the ClusterQueue |
| `kueue_cluster_queue_status` | Gauge | Reports the status of the ClusterQueue | `cluster_queue`: The name of the ClusterQueue<br> `status`: Possible values are `pending`, `active` or `terminated`. For a ClusterQueue, the metric only reports a value of 1 for one of the statuses. |

### Optional metrics

The following metrics are available only if `metrics.enableClusterQueueResources` is enabled in the [manager's configuration](/docs/installation/#install-a-custom-configured-released-version).

| Metric name | Type | Description | Labels |
| ----------- | ---- | ----------- | ------ |
| `kueue_cluster_queue_resource_usage` | Gauge | Reports the ClusterQueue's total resource usage |`cohort`: The cohort in which the queue belongs<br> `cluster_queue`: The name of the ClusterQueue<br> `flavor`: referenced flavor<br> `resource`: The resource name|
| `kueue_cluster_queue_nominal_quota` | Gauge | Reports the ClusterQueue's resource quota |`cohort`: The cohort in which the queue belongs<br> `cluster_queue`: The name of the ClusterQueue<br> `flavor`: referenced flavor<br> `resource`: The resource name|
| `kueue_cluster_queue_borrowing_limit` | Gauge | Reports the ClusterQueue's resource borrowing limit |`cohort`: The cohort in which the queue belongs<br> `cluster_queue`: The name of the ClusterQueue<br> `flavor`: referenced flavor<br> `resource`: The resource name|
| `kueue_cluster_queue_weighted_share` | Gauge | Reports a value that representing the maximum of the ratios of usage above nominal quota to the lendable resources in the cohort, among all the resources provided by the ClusterQueue. |`cluster_queue`: The name of the ClusterQueue|
| Metric name | Type | Description | Labels |
|---------------------------------------|--------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `kueue_cluster_queue_resource_usage` | Gauge | Reports the ClusterQueue's total resource usage | `cohort`: The cohort in which the queue belongs<br> `cluster_queue`: The name of the ClusterQueue<br> `flavor`: referenced flavor<br> `resource`: The resource name |
| `kueue_cluster_queue_nominal_quota` | Gauge | Reports the ClusterQueue's resource quota | `cohort`: The cohort in which the queue belongs<br> `cluster_queue`: The name of the ClusterQueue<br> `flavor`: referenced flavor<br> `resource`: The resource name |
| `kueue_cluster_queue_borrowing_limit` | Gauge | Reports the ClusterQueue's resource borrowing limit | `cohort`: The cohort in which the queue belongs<br> `cluster_queue`: The name of the ClusterQueue<br> `flavor`: referenced flavor<br> `resource`: The resource name |
| `kueue_cluster_queue_weighted_share` | Gauge | Reports a value that representing the maximum of the ratios of usage above nominal quota to the lendable resources in the cohort, among all the resources provided by the ClusterQueue. | `cluster_queue`: The name of the ClusterQueue |