Skip to content
This repository has been archived by the owner on Mar 31, 2023. It is now read-only.

Task Queues

spodila edited this page Apr 6, 2017 · 9 revisions

Fenzo now supports task queues and introduces a scheduling loop that iterates over the queue to launch them, repeatedly trying to assign resources to pending tasks. The scheduling loop execution can be customized using its builder.

Existing use of Fenzo's TaskScheduler is unaffected. That is, all existing users of Fenzo are unaffected. Using the new task queues will, however, require one to replace use of TaskScheduler for scheduling to using the new scheduling loop, summarized below.

While we work on more comprehensive documentation, here’s a brief overview.

Contents:

  1. Scheduling Loop
  2. Task Queues
  3. Tiered Queue
    1. Tier SLAs
  4. Possible future changes

Scheduling Loop

The scheduling loop in TaskSchedulingService, simplifies the use of Fenzo. Frameworks can now give tasks to Fenzo once, instead of repeatedly calling TaskScheduler.scheduleOnce(). Methods in the class can be called concurrently - new tasks can be added to the queue and new mesos resource offers can be added to TaskSchedulingService at any time, concurrently with any ongoing scheduling iterations. Fenzo then repeatedly gives a callback when successful assignments are available. Cluster autoscaling callbacks continue to work as they already do.

Task Queues

True to its customizable behavior, Fenzo allows multiple implementations of a queue. At this time Fenzo provides a tiered queue implementation. It is possible to develop other implementations of a queue, such as a simple FIFO queue, by implementing the interface TaskQueue. Below, we describe a current implementation of the queue, a Tiered Queue, which is a sophisticated multi-level queue.

Tiered Queue

A tiered queue divides the queues into multiple tiers, usually a handful of them. Each tier is then divided into multiple buckets. There can be a large number of buckets in a tier. This is implemented in TieredQueue.

Tiers represent coarse grain priority. Tasks in buckets of a higher tier are considered for resource assignment before tasks in buckets of lower tiers. Tiers are numbered 0 to N-1 for a queue of N tiers, with 0 being the highest priority tier and N-1 being the lowest priority tier.

Within a tier, buckets are kept in a sorted order based on weighted dominant resource usage from all of the bucket’s tasks. Effectively, the bucket with the lowest usage is considered for resource assignment before other buckets. The weighting is based on any SLAs in the tier, as explained below.

Within a bucket, tasks are added in a FIFO manner. A task is added to a bucket based on the property it provides in QueuableTask definition.

The following diagram shows a 2-tier queue with each tier having 4 buckets.

          +------------+-------+---+-----+
   tier 0 |  Bucket A  |  B    | C |  D  |
          +------------+-------+---+-----+

          +-----+----------+----+--------+
   tier 1 |  E  |     F    |  G |   H    |
          +-----+----------+----+--------+

Tier SLAs

Tiers and buckets within each tier can optionally be given SLAs (service level agreements). Currently, the SLAs accept resource allocation quotas for each bucket along with capacity for each tier. This can be dynamically set by passing a new TieredQueueSlas object to TaskQueue.setSla() method.

Each bucket in the tier is allocated resources that add up to its quota. Buckets can be assigned resources beyond their quota if the tier has sufficient capacity such that all other buckets' quotas can be satisfied when needed. For example, if the tier's capacity is exactly equal to the sum of all buckets' quotas, then no bucket can use resources beyond its quota. The cluster autoscaler works in conjunction such that a scale up is not requested if tasks are waiting in queue because its bucket cannot be assigned additional resources due to quota.

By default, the SLAs are effectively absent. As in, any bucket can use resources up to the tier's capacity without limits.

Currently, a bucket's quota also serves as its weight for the weighted DRF based sorting.

Possible future changes

We summarize a few possible upcoming changes here for the purposes of sharing the thought process on how we see these queues being used in the longer term, not necessarily as a confirmed list of enhancements at this time.

  • Finer grain entities within buckets to support hierarchical weights for sub-grouping of queues.
  • Task evictions in order to launch tasks in a higher tier, if the cluster is running at capacity.
  • Task evictions in order to balance the usage across buckets of a tier.