Test job cancel when scheduled on Kubernetes #8

wvengen · 2024-01-25T15:31:42Z

Stopping jobs mostly works, but it has a number of cases to test.

Just created, but not running yet -> remove job/container without stopping it (not tested)
Running -> send signal (tested in PR Integration tests #21)
Finished -> do nothing (tested in PR Integration tests #21)

Can you think of more corner-cases?
Especially in the first case, there may be various stages (e.g. on Kubernetes, waiting for resources, pulling the image).

See also documentation on scrapyd's cancel endpoint.

Note that in PR #21 tests have been added, including some for job cancellation. The main thing now is testing that a job is removed when it is cancelled before it has started. This issue is now about implementing that, including finding a way to test it reliably.

wvengen · 2024-02-15T19:40:04Z

As part of #7 it appears that sending a signal to a running container doesn't work. PR #21 contains a fix, but still killing the spider doesn't seem to do anything.

An important reason is probably that the spider is run as the init process, as PID 1, and cannot be killed.
update enabled shareProcessNamespace to the pod spec, which adds an init process; as we only have one container in the pod this has no real other side-effects.

wvengen · 2024-02-16T08:29:31Z

Fixed behaviour during run in PR #21.
Testing for pending and finished jobs is missing still, and there could perhaps be race conditions (e.g. is the container always running then scrapyd-k8s thinks the job is running?).

wvengen · 2024-02-16T08:36:09Z

At this moment, we look at job.status.ready and if it is, then we assume that we can exec into it. There could be a race condition where the job and pod are running, but the container is not. A solution is described here:

            # kill pod (retry is disabled, so there should be only one pod)
            pod = self._get_pod(project, job_id)
            if pod: # if a pod has just ended, we're good already, don't kill
                # make sure container is running - https://stackoverflow.com/a/74833787
                if all([c.state.running for c in pod.status.container_statuses]):
                  self._k8s_kill(pod.metadata.name, Signals['SIG' + signal].value)
                else:
                   # refactor code to fall through to delete the job instead

Not encountered yet, so not including this check for now.

wvengen · 2024-03-13T08:11:46Z

One approach to make this, is to add an option to create a suspended job, e.g. when the schedule endpoint is called a special query parameter (one that wouldn't be used as a setting), or perhaps a special header. Then the test can use it for testing this scenario

wvengen mentioned this issue Jan 25, 2024

Add integration testing #7

Closed

wvengen changed the title ~~Make stopping jobs more robust~~ Make job cancel on Kubernetes more robust Jan 25, 2024

wvengen mentioned this issue Jan 25, 2024

Test job cancel when scheduled on Docker #9

Open

wvengen added the k8s Kuberenetes label Jan 31, 2024

wvengen changed the title ~~Make job cancel on Kubernetes more robust~~ Test job cancel when scheduled on Kubernetes Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test job cancel when scheduled on Kubernetes #8

Test job cancel when scheduled on Kubernetes #8

wvengen commented Jan 25, 2024 •

edited

Loading

wvengen commented Feb 15, 2024 •

edited

Loading

wvengen commented Feb 16, 2024 •

edited

Loading

wvengen commented Feb 16, 2024 •

edited

Loading

wvengen commented Mar 13, 2024

Test job cancel when scheduled on Kubernetes #8

Test job cancel when scheduled on Kubernetes #8

Comments

wvengen commented Jan 25, 2024 • edited Loading

wvengen commented Feb 15, 2024 • edited Loading

wvengen commented Feb 16, 2024 • edited Loading

wvengen commented Feb 16, 2024 • edited Loading

wvengen commented Mar 13, 2024

wvengen commented Jan 25, 2024 •

edited

Loading

wvengen commented Feb 15, 2024 •

edited

Loading

wvengen commented Feb 16, 2024 •

edited

Loading

wvengen commented Feb 16, 2024 •

edited

Loading