-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubeflow-dashboard
channels 1.8/edge
and latest/edge
is not functional when deployed with Juju 3.5
#188
Comments
Thank you for reporting us your feedback! The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5651.
|
To gather more information, I've tested:
So this concludes that the It's worth noting that we are using the rock in The Juju 3.5 release notes include the change juju/juju#17070, I'm curious if this has to do with the errors we're seeing, and relating to the rock that we are using. |
Comparing the pod definitions when deploying with 3.4 and 3.5, this security context is additional in 3.5:
|
I can reproduce with:
But in my case, kubeflow-dashboard's charm goes to Error because ops's pebble raises an exception during charm execution:
|
I can also reproduce this behaviour directly using kubernetes and the rock charmedkubeflow/kubeflow-central-dashboard (source rockcraft.yaml) by:
where broken-dashboard-rock.yaml is: apiVersion: v1
kind: Pod
metadata:
name: kubeflow-dashboard-with-security-context
spec:
automountServiceAccountToken: true
securityContext:
fsGroup: 170
supplementalGroups:
- 170
volumes:
- name: sec-ctx-vol
emptyDir: {}
containers:
- name: kubeflow-dashboard
image: centraldashboard:rock
volumeMounts:
- name: sec-ctx-vol
mountPath: /data/demo
securityContext:
runAsGroup: 0
runAsUser: 0
volumeMounts:
- mountPath: /charm/bin/pebble
name: charm-data
readOnly: true
subPath: charm/bin/pebble
- mountPath: /charm/container
name: charm-data
subPath: charm/containers/kubeflow-dashboard
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-b4hms
readOnly: true
volumes:
- emptyDir: {}
name: charm-data
- name: kube-api-access-b4hms
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace (note that I'm reusing the This raises the same errors as @NohaIhab mention above. If we remove the Looking at the serviceaccount/token that npm reports we cannot access, we see that it is in group
which makes sense because setting |
Interestingly, if we do the same example as above but use upstream's image (
So for the above cases, whether the upstream image is executed as Interestingly if I A theory of what is going on ...My guess at what happening is that we're hitting trouble because rocks run their services via Next steps
|
Tried deploying the rock pod with the following changes (full yaml given below):
pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: kubeflow-dashboard-with-security-context-as-584792
spec:
automountServiceAccountToken: true
securityContext:
fsGroup: 584792
supplementalGroups:
- 584792
serviceAccount: kubeflow-dashboard
serviceAccountName: kubeflow-dashboard
volumes:
- name: sec-ctx-vol
emptyDir: {}
containers:
- name: kubeflow-dashboard
image: kubeflow-central-dashboard:1.8
volumeMounts:
- name: sec-ctx-vol
mountPath: /data/demo
securityContext:
runAsGroup: 584792
runAsUser: 584792
volumeMounts:
- mountPath: /charm/bin/pebble
name: charm-data
readOnly: true
subPath: charm/bin/pebble
- mountPath: /charm/container
name: charm-data
subPath: charm/containers/kubeflow-dashboard
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-b4hms
readOnly: true
volumes:
- emptyDir: {}
name: charm-data
- name: kube-api-access-b4hms
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace So I think this reinforces the theory above that kubernetes natively grants the fsGroup to the entrypoint's process, but pebble does not do the same for its children |
Based on @NohaIhab's suggestion to use the upstream image instead of the rock, I have pushed two temporal patches (#191 and #192) to help alleviate the issue and unblock other workflows. Also, based on some comments from the Juju team, it seems like the issue is rather on that side, so not much more we can do on the charm and rock. |
FYI, this issue has been reported in juju, let's keep an eye on https://bugs.launchpad.net/juju/+bug/2066517. |
…image (#178)" (#191) This reverts commit 2811dcb. This is a temporal workaround to help avoid #188. This change must be reverted again once the fix for it is available. Please note the fix depends on https://bugs.launchpad.net/juju/+bug/2066517.
…image (#178)" (#192) This reverts commit 2ad140e. This is a temporal workaround to help avoid #188. This change must be reverted again once the fix for it is available. Please note the fix depends on https://bugs.launchpad.net/juju/+bug/2066517.
When deploying the dashboard charm
In Juju 3.5.1, the charm went to
It looks like in Juju 3.5.1 we are no longer seeing the service account token On Juju 3.5.1, when refreshing the charm to use the upstream image instead like:
we no longer see any error in the workload logs. |
I have tested the charm (1.8/stable) with the rock (
This could potentially be an issue with the rock, as the EDIT: I have filed canonical/kubeflow-rocks#104, it seems like this is a rocks issue rather than a juju issue. |
Bug Description
Note: This behavior is seen with
1.8/edge
andlatest/edge
, but not1.8/stable
.This issue is blocking the CI of PR #canonical/dex-auth-operator#187.
This came up during the CI updates to Juju 3.5, specifically in the dex-auth integration CI where
test_login
is timing out to login to the dshboard.When trying to access the dashboard from the browser, it's unsuccessful with this error shown:
the response status is
503 Service Unavailable
and the server isistio-envoy
Observing the
istio-ingerssgateway
workload logs:delayed connection error 111 typically means the application port is not open
and observing the
kubeflow-dashboard
workload logs:we can see that the npm server is not able start, and the pebble service is constantly restarting with permission denied errors to access the files:
The kubeflow dashboard is down in this case, but the charm status still shows up as
active
. This is problematic because the charm's state is not representative.To Reproduce
Environment
juju 3.5/stable
microk8s 1.25-strict/stable
Relevant Log Output
Additional Context
No response
The text was updated successfully, but these errors were encountered: