Skip to content

Step Timeout

Rajan, Sarat edited this page Aug 15, 2019 · 1 revision

Step Timeout

Overview

Currently, if a job hangs, it will stay running indefinitely taking up an executor slot. This has greatly increased as we're dealing with connection issues, but it's a general problem that should be addressed.

Design

We will introduce a time limit for each pipeline step execution.

By default, this will be 30 minutes (per step), but it will be possible to override on a per-step basis.

- name: my-step
  image: ...
  timeoutInMinutes: 120 # step may run for 2 hours
  commands:
    - ...

Timeout behavior

A timeout will be treated the same as any other step failure. That is, it will move the PIPELINE_STATE to FAILURE, and future jobs will not be executed unless they have conditions to run in FAILURE state. Additionally the overall job will be marked as FAILED.

The explanation field in the pipeline state will indicate a timeout occurred. E.g.

"stepStates": [
{
    "commands": [
       "gradle --no-daemon clean test jacocoTestReport"
    ],
    "durationMillis": 1860000,
    "image": "gradle:5.0-jre8-alpine",
    "startTime": 1554239761551,
    "name": "test-pipeline",
    "status": "FAILURE",
    "explanation": "30 minute timeout exceeded"
},

Implementation

The jenkins workflow timeout function will provide the timer and interruption behavior.

We'll start the timer as soon as we enter the stage block, so the time will include pulling the image.

stage(step.name) {
    timeout(step.timeoutInMinutes, unit=MINUTES) {
      ...
    }
}