Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report the volume of etcd writes via a diagnostic #14604

Merged
merged 1 commit into from
Jun 15, 2017

Conversation

smarterclayton
Copy link
Contributor

@smarterclayton smarterclayton commented Jun 12, 2017

New EtcdWriteVolume diagnostic measures the number of writes in a time
period to determine where significant write volume is going.

[test]

@derekwaynecarr @eparis

Will make debugging this easier next time:

$ ETCD_WRITE_VOLUME_DURATION=10s oadm diagnostics EtcdWriteVolume --master-config=openshift.local.config/master/       master-config.yaml
[Note] Determining if client configuration exists for client/cluster diagnostics
debug: Reading client config at /Users/clayton/projects/origin/src/github.com/openshift/origin/openshift.local.        config/master/admin.kubeconfig
Info:  Successfully read a client config file at '/Users/clayton/projects/origin/src/github.com/openshift/origin/      openshift.local.config/master/admin.kubeconfig'

[Note] Running diagnostic: EtcdWriteVolume
       Description: Check the volume of writes against etcd and classify them by operation and key for 10s

Info:  Measured 0.2 writes/sec
       /                                                                          2 100.0%
       /v3:PUT                                                                    2 100.0%
       /v3:PUT/kubernetes.io                                                      2 100.0%
       /v3:PUT/kubernetes.io/events                                               1  50.0%
       /v3:PUT/kubernetes.io/events/default                                       1  50.0%
       /v3:PUT/kubernetes.io/events/default/datadir-mysql-0.14c770c1577b8f64      1  50.0%
       /v3:PUT/kubernetes.io/masterleases                                         1  50.0%
       /v3:PUT/kubernetes.io/masterleases/10.192.209.221                          1  50.0%

[Note] Summary of diagnostics execution (version v3.6.0-alpha.2+021fabc-135-dirty):
[Note] Completed with no errors or warnings seen.

@smarterclayton
Copy link
Contributor Author

@sosiouxme i added the "default skip" behavior I asked about.

@smarterclayton smarterclayton added this to the 3.6.0 milestone Jun 12, 2017
@eparis
Copy link
Member

eparis commented Jun 13, 2017

My version worked just fine, geez!
timeout 5m etcdctl --endpoints=https://ip-1-2-3-4.ec2.internal:2379,https://ip-1-2-3-4.ec2.internal:2379,https://ip-1-2-3-5.ec2.internal:2379 --ca-file=/etc/origin/master/master.etcd-ca.crt --cert-file=/etc/origin/master/master.etcd-client.crt --key-file=/etc/origin/master/master.etcd-client.key watch -r -f / | grep '.' | grep -v -i '"kind":"event"' | sort | uniq -c | sort -n

What makes yours so great?

@smarterclayton
Copy link
Contributor Author

smarterclayton commented Jun 13, 2017 via email

@eparis
Copy link
Member

eparis commented Jun 13, 2017

I do actually dislike inscrutable ENV vars. It doesn't look easy to make the timer a flag, but if you can find a way, it would make it a WHOLE lot more discoverable.

@smarterclayton
Copy link
Contributor Author

Unfortunately we don't have any infrastructure for that today in diagnostics. I went back and forth - the reason i went with env var is that 90% of the time the default is enough. However, if you wanted to get a shorter run (test cases) or longer run (less bursty environments) then you'd have to recompile. So env var is more for the skilled user in that case. I agree that it's not ideal.

@smarterclayton
Copy link
Contributor Author

Any other comments? I would like to have this tool available, and agree adding args in the future to this would be good.

@eparis
Copy link
Member

eparis commented Jun 14, 2017

[merge]
[test]

@openshift-bot
Copy link
Contributor

openshift-bot commented Jun 14, 2017

continuous-integration/openshift-jenkins/merge Waiting: You are in the build queue at position: 18

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to f74b38b

@smarterclayton
Copy link
Contributor Author

[severity:bug]

New EtcdWriteVolume diagnostic measures the number of writes in a time
period to determine where significant write volume is going.
@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 68e1ede

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_origin/2249/) (Base Commit: 76c0850)

@smarterclayton smarterclayton merged commit 66adcf8 into openshift:master Jun 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants