Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Workspace Resource Monitor Plugin #919

Merged
merged 29 commits into from
Jan 15, 2021
Merged

Conversation

svor
Copy link
Contributor

@svor svor commented Nov 10, 2020

What does this PR do?

This PR provides new built-in che-theia plugin resource-monitor-plugin.
The plugin shows information about used resources in workspace pod by using k8s API. For now it represents an information only about memory and CPU.

Note: Resource Monitor will be displayed only if Metric server was deployed and run on a cluster where Che is running. OpenShift 4 contains it by default and for Minishift and Minikube need to do it by hands.

The general information is displayed in Status Bar and it shows how much resources the workspace uses at all
screenshot-nimbus-capture-2020 11 10-12_40_43
to see detailed information about each container need to click on that element in Status Bar
screenshot-nimbus-capture-2020 11 10-12_42_13

Memory
For each container we can read Used value (from Metric server) and Limited (from workspace Pod description)
CPU
Each container has Used value (from Metric server) but it is not possible to read Limited value if it wasn't set in workspace description. That's why status bar contains an information only about used CPU resources.

This PR is depends on eclipse-che/che-operator#519 which customize cluster roles to make it possible to communicate with Metric server through k8s API.

Screenshot/screencast of this PR

ResMon
screenshot-che-che 192 168 99 180 nip io-2020 11 26-14_12_36
screenshot-che-che 192 168 99 182 nip io-2020 12 04-16_16_03
screenshot-che-che 172 17 0 2 nip io-2020 11 27-23_27_54

What issues does this PR fix or reference?

eclipse-che/che#17205

How to test this PR?

  1. Make sure that Metric server is running in the cluster where Che will be deployed (OpenShift 4 provides it by default).
    For Minikube it could be done by:
  • minikube addons enable metrics-server
  • kubectl -n kube-system rollout status deployment metrics-server
  1. Deploy Che by using chectl with these changes Add metrics role and extend view role che-operator#519
  2. Create a workspace by using this devfile:
Tested Devfile
apiVersion: 1.0.0
metadata:
  name: golang-mrkd5
projects:
  - name: example
    source:
      location: 'https://github.com/golang/example.git'
      type: git
    clonePath: src/github.com/golang/example/
  - name: golang-echo-realworld-example-app
    source:
      location: 'https://github.com/xesina/golang-echo-realworld-example-app.git'
      type: git
    clonePath: src/github.com/xesina/golang-echo-realworld-example-app
components:
  - id: golang/go/latest
    memoryLimit: 512Mi
    preferences:
      go.lintFlags: '--fast'
      go.useLanguageServer: true
      go.lintTool: golangci-lint
    type: chePlugin
    alias: go-plugin
    env:
      - value: 'off'
        name: GO111MODULE
  - mountSources: true
    endpoints:
      - name: 8080-tcp
        port: 8080
    memoryLimit: 2Gi
    type: dockerimage
    image: 'quay.io/eclipse/che-golang-1.14:nightly'
    alias: go-cli
    env:
      - value: $(CHE_PROJECTS_ROOT)
        name: GOPATH
      - value: /tmp/.cache
        name: GOCACHE
  - reference: https://raw.githubusercontent.com/svor/che-plugin-registry/sv/testOperator/v3/plugins/eclipse/che-theia/next/meta.yaml
    type: cheEditor
commands:
  - name: 1.1 Run outyet
    actions:
      - workdir: '${CHE_PROJECTS_ROOT}/src/github.com/golang/example/outyet'
        type: exec
        command: go get -d && go run main.go
        component: go-cli
  - name: 1.2 Stop outyet
    actions:
      - type: exec
        command: kill $(pidof go)
        component: go-cli
  - name: 1.3 Test outyet
    actions:
      - workdir: '${CHE_PROJECTS_ROOT}/src/github.com/golang/example/outyet'
        type: exec
        command: go test
        component: go-cli
  - name: '2.1 xenisa :: install dependencies'
    actions:
      - workdir: '${GOPATH}/src/github.com/xesina/golang-echo-realworld-example-app'
        type: exec
        command: go mod download
        component: go-cli
  - name: '2.2 xenisa :: run'
    actions:
      - workdir: '${GOPATH}/src/github.com/xesina/golang-echo-realworld-example-app'
        type: exec
        command: go run main.go
        component: go-cli
  - name: '2.3 xenisa :: build'
    actions:
      - workdir: '${GOPATH}/src/github.com/xesina/golang-echo-realworld-example-app'
        type: exec
        command: go build
        component: go-cli
  - name: '2.4 xenisa :: test'
    actions:
      - workdir: '${GOPATH}/src/github.com/xesina/golang-echo-realworld-example-app'
        type: exec
        command: go test ./...
        component: go-cli
  - name: Run current file
    actions:
      - workdir: '${fileDirname}'
        type: exec
        command: 'go get -d && go run ${file}'
        component: go-cli
  - name: Debug current file
    actions:
      - referenceContent: |
          {
            "version": "0.2.0",
            "configurations": [
              {
                "name": "Debug current file",
                "type": "go",
                "request": "launch",
                "mode": "auto",
                "program": "${fileDirname}"
              }
            ]
          }
        type: vscode-launch

PR Checklist

As the author of this Pull Request I made sure that:

Reviewers

Reviewers, please comment how you tested the PR when approving it.

Happy Path Channel

HAPPY_PATH_CHANNEL=stable

@svor svor self-assigned this Nov 10, 2020
@svor svor changed the title Sv/resource monitor Workspace Resource Monitor Plugin Nov 10, 2020
@benoitf
Copy link
Contributor

benoitf commented Nov 10, 2020

hello, would it make sense to expose these metrics from the @eclipse/che-plugin namespace ?
because other components might be interested in knowing the memory/cpu information ?

Main motivation is more about the k8s API access. It should be provided by che-theia, else each plug-in/extension will add 25Mi+ of dependencies.
With dev workspace, k8s api will be a requirement as well, so in any case it will end it up into che-theia extensions, so it will ease each extension/plug-in to use that API.

@eclipse-che eclipse-che deleted a comment from che-bot Nov 10, 2020
@svor
Copy link
Contributor Author

svor commented Nov 10, 2020

@benoitf Do you think k8s API will be needed for another plugins/extensions? Also kubernetes-client/javascript doesn't implement Metrics API yet, so this plugin uses kubernetes-client lib just to get cluster configuration and send raw queries to Kubernetes API.

@benoitf
Copy link
Contributor

benoitf commented Nov 10, 2020

@svor yes basically with dev-workspace we'll do tons of stuff with k8s-api as all information is there
so all workspace-client stuff will goes through k8s. So: plug-in ext extension will require it.
I think recommendation plug-in might need as well the total memory available per pod (with k8s) and then there is workspace resource plug-in

I'm fine if for now we don't provide metrics but IMHO query api should be exposed as I see many upcoming requests.

@eclipse-che eclipse-che deleted a comment from che-bot Nov 11, 2020
@eclipse-che eclipse-che deleted a comment from che-bot Nov 11, 2020
@vitaliy-guliy
Copy link
Contributor

I'm thinking about info in status bar. I think it's better to display the status only for one container? Displaying it overall for all containers is a bit confusing.
Giving a look at the status bar, I may think that I have 8GB memory, which will be anougn for everything. But it could happen that some process/plugin in some container will fail, because this container has a limitation in 256 mb.
Then the user will ask: why does it fail if I have 6GB memory?

@eclipse-che eclipse-che deleted a comment from che-bot Nov 18, 2020
@eclipse-che eclipse-che deleted a comment from che-bot Nov 18, 2020
Signed-off-by: svor <[email protected]>
Signed-off-by: svor <[email protected]>
@eclipse-che eclipse-che deleted a comment from che-bot Nov 18, 2020
@benoitf
Copy link
Contributor

benoitf commented Nov 18, 2020

I agree with @vitaliy-guliy that displaying the current/total memory is misleading as you might thing everything is OK while you have 99% of the memory of a container being used.

But then I don't know how to display something valuable with numbers like current/total Maybe a color indicator with a label changing from Memory OK in green to Memory alert and some limits being pre-defined like if one of current memory of a container is reaching more than 80% then the indicator becomes orange, then if one > 90% it becomes red, I don't know.

In the detailed view, can we bring colors as well ? and be able to sort by 'most available memory' or 'consuming the most of cpu`

Surely UX team could help there

cc @beaumorley @parvathyvr

@vitaliy-guliy
Copy link
Contributor

I would display detailed info somewhere in MY WORKSPACE panel.

@svor
Copy link
Contributor Author

svor commented Nov 18, 2020

I decided to display general information about the workspace in total, because:
1/ If we show information only about some container, it is not clear which container it should be
2/ I don't want to add more text into status bar to describe the name of container, it will overload the status bar.

But I'm OK to display only some message like Memory OK/Memory alert with different colors in the status bar. Or just change the color of the text which i have now when some container will have little memory.

About detailed info, as i know we can't bring different colors into QuickPick window which I use to represent it and also the content in the window doesn't update in real time.

@benoitf
Copy link
Contributor

benoitf commented Dec 4, 2020

@svor so it seems adding a text around the ban icon would help

@svor
Copy link
Contributor Author

svor commented Dec 4, 2020

@benoitf @vitaliy-guliy what do you think about this, is it better?
screenshot-che-che 192 168 99 182 nip io-2020 12 04-16_16_03

@benoitf
Copy link
Contributor

benoitf commented Dec 4, 2020

cc @parvathyvr is it better ?

@parvathyvr
Copy link

cc @parvathyvr is it better ?

@benoitf yes, I think having the 'Resources' near the ban icon makes it clear what the icon is indicating! cc @svor

@che-bot

This comment has been minimized.

Copy link
Contributor

@ericwill ericwill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything else needed here? Or can we merge now.

@benoitf
Copy link
Contributor

benoitf commented Dec 7, 2020

probably merge after 7.23

@ericwill
Copy link
Contributor

ericwill commented Dec 7, 2020

probably merge after 7.23

Any reason to wait? I'd like to show this at sprint review tomorrow.

@benoitf
Copy link
Contributor

benoitf commented Dec 7, 2020

you can still show it at sprint demo.
I think this kind of new plugin is better to be introduced at the beginning of the cycle than one day before the tag ( especially because it requires other dependencies in many other components and that we don't know if it'll work on che.openshift.io)
Also once merged anyone using Che can toggle the editor to next and see it live ( no need to wait extra 3 weeks)

@benoitf
Copy link
Contributor

benoitf commented Dec 17, 2020

hey, I thought this PR was merged after 7.23
Is it gonna happen ? There were tons of feedback when it was demoed but I think it can be addressed by iterating as we've few weeks without downstream releases

Signed-off-by: svor <[email protected]>
@che-bot

This comment has been minimized.

Signed-off-by: svor <[email protected]>
@che-bot

This comment has been minimized.

@svor
Copy link
Contributor Author

svor commented Dec 21, 2020

Tried to run these changes on che.openshift.io and for some reason Resources monitor is not displayed, i'm going to investigate why

@svor
Copy link
Contributor Author

svor commented Dec 21, 2020

looks like there is a problem to read metrics for non-admin user:

kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/vsvydenk-che/pods/workspace0v3e2hcqhc8f8nvx.go-cli-66c57fcbb6-kwwjz
Error from server (Forbidden): pods.metrics.k8s.io "workspace0v3e2hcqhc8f8nvx.go-cli-66c57fcbb6-kwwjz" is forbidden: User "vsvydenk" cannot get pods.metrics.k8s.io in the namespace "vsvydenk-che": no RBAC policy matched

@che-bot
Copy link
Contributor

che-bot commented Jan 4, 2021

✅ E2E Happy path tests succeed 🎉

Try Che-Theia editor only Try Che-Theia with Java/maven example Try Che-Theia with NodeJs example

See Details

name link
che-theia docker.io/maxura/che-theia:919
che-theia-endpoint-runtime-binary docker.io/maxura/che-theia-endpoint-runtime-binary:919

Tested with Eclipse Che Single User on K8S (minikube v1.1.1)

  • Use comment "[crw-ci-test]" to rerun happy path E2E test.
  • Use comment "[crw-ci-test --rebuild]" to re-build the images and rerun happy path E2E test.

@svor
Copy link
Contributor Author

svor commented Jan 4, 2021

@ibuziuk do you have some idea how we can allow non-admin users to read k8s metrics to avoid massage like:

{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods.metrics.k8s.io "workspace5elv4ll9wy7k47zw.go-cli-9844ddbf9-dfqlw" is forbidden: User "system:serviceaccount:vsvydenk-che:che-workspace" cannot get pods.metrics.k8s.io in the namespace "vsvydenk-che": no RBAC policy matched","reason":"Forbidden","details":{"name":"workspace5elv4ll9wy7k47zw.go-cli-9844ddbf9-dfqlw","group":"metrics.k8s.io","kind":"pods"},"code":403} 

@svor
Copy link
Contributor Author

svor commented Jan 14, 2021

After testing this plugin on dogfooding instance I got a problem with an access to read pod information for service account:
screenshot-che-dogfooding apps che-dev x6e0 p1 openshiftapps com-2021 01 14-15_06_19

code: 403. Error: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods \"workspacegdve9us4ad860vrl.go-cli-85c4d6d888-mb7bc\" is forbidden: User \"system:serviceaccount:vsvydenk-che:che-workspace\" cannot get resource \"pods\" in API group \"\" in the namespace \"vsvydenk-che\"","reason":"Forbidden","details":{"name":"workspacegdve9us4ad860vrl.go-cli-85c4d6d888-mb7bc","kind":"pods"},"code":403}

    at ResMon.<anonymous> (/tmp/theia-unpacked/eclipse_che_resource_monitor_plugin.theia/lib/resource-monitor-plugin.js:119:35)
    at step (/tmp/theia-unpacked/eclipse_che_resource_monitor_plugin.theia/lib/resource-monitor-plugin.js:42:23)
    at Object.next (/tmp/theia-unpacked/eclipse_che_resource_monitor_plugin.theia/lib/resource-monitor-plugin.js:23:53)
    at fulfilled (/tmp/theia-unpacked/eclipse_che_resource_monitor_plugin.theia/lib/resource-monitor-plugin.js:14:58)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)

probably service account needs to have more permission because it is possible to get Pod and Metrics information from the console where I logged in as vsvydenk user.

@ericwill @benoitf WDYT should I continue investigate this problem with this PR or it is better to merge it and open another issue?

@benoitf
Copy link
Contributor

benoitf commented Jan 14, 2021

maybe if we don't have access we don't enable/display the plugin
so we can merge it quickly and user don't see errors on our hosted instances.
And we investigate in parallel

Signed-off-by: svor <[email protected]>
@che-bot
Copy link
Contributor

che-bot commented Jan 14, 2021

✅ E2E Happy path tests succeed 🎉

Try Che-Theia editor only Try Che-Theia with Java/maven example Try Che-Theia with NodeJs example

See Details

name link
che-theia docker.io/maxura/che-theia:919
che-theia-endpoint-runtime-binary docker.io/maxura/che-theia-endpoint-runtime-binary:919

Tested with Eclipse Che Single User on K8S (minikube v1.1.1)

  • Use comment "[crw-ci-test]" to rerun happy path E2E test.
  • Use comment "[crw-ci-test --rebuild]" to re-build the images and rerun happy path E2E test.

@ericwill
Copy link
Contributor

maybe if we don't have access we don't enable/display the plugin

Wouldn't it be better to load the plugin, but show the error message? Maybe the user wants to know about it.

so we can merge it quickly and user don't see errors on our hosted instances.
And we investigate in parallel

I am fine with merging now and iterating.

@svor
Copy link
Contributor Author

svor commented Jan 14, 2021

agree with Eric to have a message that resources information is not available (which we can see on the screenshot above) shouldn't be a problem for the user. It is just a warning message if you click on the ban icon.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants