Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wait for job does not work as expected #60

Closed
fdutton opened this issue Sep 27, 2022 · 12 comments
Closed

Wait for job does not work as expected #60

fdutton opened this issue Sep 27, 2022 · 12 comments

Comments

@fdutton
Copy link

fdutton commented Sep 27, 2022

I was expecting for this app to wait until a job completed successfully but it only waited for the job to be ready. Am I misunderstanding something?

This is a portion of my deployment resource and I have verified that my job runs to completion and exits with a status code of 0.

initContainers:
  - name: data-migration-init
    image: 'groundnuty/k8s-wait-for:v1.7'
    args:
      - job
      - my-data-migration-job
@groundnuty
Copy link
Owner

groundnuty commented Sep 27, 2022

I most definitely use it to wait for a job to be completed. example:

        - name: wait-for-onezone
          image: {{ .Values.wait_for.image }}
          imagePullPolicy: {{ template "imagePullPolicy" dict "root" . "context" .Values.wait_for }}
          args:
            - "job"
            - "{{ template "onezone_name" . }}-ready-check"

Please try image groundnuty/k8s-wait-for:v1.5.1 I have not upgraded my production envs to the newest image. Mabe some bug got into it...

@fdutton
Copy link
Author

fdutton commented Sep 27, 2022

Will do. Thanks to the quick response.

@fdutton
Copy link
Author

fdutton commented Sep 27, 2022

Version 1.5.1 works as expected.

I'm not in production yet so I'm willing to help isolate the issue. I'll try a 1.6 version tomorrow and let you know the results.

@groundnuty
Copy link
Owner

groundnuty commented Sep 28, 2022

Had a hunch and it was right that's a diff between kubectl describe job <> between kubectl v1.24.0 and v1.25.2:

< Start Time:     Wed, 21 Sep 2022 11:03:23 +0200
< Pods Statuses:  1 Active / 0 Succeeded / 0 Failed
---
> Start Time:     Wed, 21 Sep 2022 09:03:23 +0000
> Pods Statuses:  1 Running / 0 Succeeded / 0 Failed

They changed Running to Active... not sure how it could break the code yet, since it uses regexp-es that should be ok with that...

@fdutton
Copy link
Author

fdutton commented Sep 28, 2022

Version 1.6 does not work.

I diff'd wait_for.sh and don't see anything that would change its behavior.

v1.5.1 uses kubectl 1.21.0 and v1.6 uses kubectl 1.24.0 so there is probably a change there.

@stephenpope
Copy link

noroot-v1.7 running on K8S 1.25 has the same issue and doesnt wait for the job to be successful.

Switched to v1.5.1 and works as expected. Would be nice to be runnng the noroot version :)

@anleib
Copy link

anleib commented Nov 4, 2022

Got hit by this as well, switched to v1.5.1 works as expected now.

@DARB-CCM-S-20
Copy link

Also got hit by this in v1.7, is someone working on a fix?

@groundnuty
Copy link
Owner

groundnuty commented Nov 14, 2022

I found the problem. After all, the regexp was not working after k8s changed this:

Pods Statuses:    0 Running / 1 Succeeded / 0 Failed
Pods Statuses:    1 Active (0 Ready) / 0 Succeeded / 0 Failed

The change is connected with feature gate JobReadyPods that as far as I find, was introduced k8s v1.23. It adds Ready info to JobStatus.

As far as I understand Ready should always be =< Active, as Active still counts scheduled but not yet Succeeded/Failed pods and Ready just gives extra info on which of them are actually running now.

Furthermore, it seems that v1.7 should work with k8s clusters < v1.23.

@fdutton , @anleib, @DARB-CCM-S-20, @stephenpope if you could possibly share on which k8s version did you experience your problems? So that we can be sure that my conclusions here are correct.

@DARB-CCM-S-20
Copy link

@groundnuty Great work! 1.24 for me. I've internalized v1.7 for now and changed to v1.21 which is working fine.

@anleib
Copy link

anleib commented Nov 22, 2022

I am on 1.24 K8s as well

@one-adam-nolan
Copy link

I am on 1.24 K8s as well

Running v1.24.14 and ended up having to use v1.5.1- newer versions just completed immediately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants