Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openshift_checks: refactor find_ansible_mount #4944

Merged
merged 1 commit into from
Aug 11, 2017
Merged

openshift_checks: refactor find_ansible_mount #4944

merged 1 commit into from
Aug 11, 2017

Conversation

sosiouxme
Copy link
Member

Reuse the code for finding the ansible_mounts mount for a path.

Copy link
Contributor

@rhcarvalho rhcarvalho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, clearly removes duplication. Some comments for possible improvements or making sure the intentions are clear, no changes necessary.

@@ -34,7 +46,7 @@ def test_cannot_determine_available_disk(ansible_mounts, extra_words):
with pytest.raises(OpenShiftCheckException) as excinfo:
DiskAvailability(fake_execute_module, task_vars).run()

for word in 'determine disk availability'.split() + extra_words:
for word in extra_words:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're changing this, we could as well drop the extra_ prefix, and since they are not necessarily "words", perhaps s/extra_words/chunks, or something that better explains what it is? Naming... 🚲

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure :)

It's a band-aid anyway, the real goal is to base on #4913 and have the tests look for "named" exceptions.

@@ -56,7 +57,7 @@ def test_cannot_determine_available_mountpath(ansible_mounts, extra_words):
with pytest.raises(OpenShiftCheckException) as excinfo:
check.run()

for word in 'determine valid etcd mountpath'.split() + extra_words:
for word in ['Unable to determine mount point'] + extra_words:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure, the original was meant to have just some keywords, and look for them in no particular order, the new code looks for an exact sentence/chunk. The trouble of this kinds of tests is that a change in the message requires a change in the test, making it more maintenance burden. No objections if that was your intention.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the trouble, and while splitting into words makes the test less sensitive to whitespace changes, it's still the case that changing the wording is likely to require a change to the tests; and at the same time, it makes the test a little less accurate - you could potentially match a similar nearby message that had all the right words. I did find one false positive like this. So, it was my intent.


mount_point = path
while mount_point not in mount_for_path:
if mount_point in ["/", ""]: # "/" not in ansible_mounts???
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the comment mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incredulity that ansible_mounts would ever come back without / mapped :) i guess it's not a very illuminating comment.

}

mount_point = path
while mount_point not in mount_for_path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while + condition + if + break == while + new_condition

Let's see...

while mount_point not in mount_for_path and mount_point not in ["/", ""]:
    mount_point = os.path.dirname(mount_point)

Simplifying:

while mount_point not in list(mount_for_path.keys()) + ["/", ""]:
    mount_point = os.path.dirname(mount_point)

Since the condition is re-evaluated in every iteration, and to make it shorter:

# NOTE: we add '/' and '' because those are the base paths that
# os.path.dirname might return, preventing an infinite loop.
mount_points = set(mount_for_path.keys()) | {'/', ''}
while mount_point not in mount_points:
    mount_point = os.path.dirname(mount_point)

Hmm, the single condition still sort of asks for an explanation why we need the extra paths :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make it read better. Thanks.


return free_bytes
raise OpenShiftCheckException(
'Unable to retrieve disk availability for "{}" due to a bug.\n'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, as a user, I do not like a software telling me about it's own bugs, specially without any clue as to what the bug is and what can I do about it...
Suggestion: s/due to a bug//, s/Ansible// (do not blame the tool :P)

'Unable to compute disk availability for "{}". Missing key "size_available" ... got "{}".... You can retry running the check, or inspect the output of ansible -m setup HOSTNAME and make sure Ansible can report... etc.'

The user should understand from the message:

  • What happened at a high level: "the disk availability check failed because it was unable to compute disk availability"
  • Some indication of why it failed: "we have information about total_size, etc etc for the mount path /foo/bar, but we don't have 'size_available'" -- and we should hope this actually never happens.
  • What can the user do next -- hmm, in this case maybe retry, if not there is the ansible -m setup... or report a bug...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I would like the user to know what's going on at this level of detail. I get a little lazy when it seems like a really unlikely thing to encounter and complicated to explain accurately if it did happen (because if this is broken, then enough has changed that you probably need to re-evaluate your assumptions), and so stick to simple "something went wrong, and it's not your fault" messages. But you're right, a little further thought can help craft a more helpful error message, both for the user and for the developer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Communicating the errors is not easy. I did feel I got lengthy above... Oh well, better than perfect in one unlike case is to be consistent :-)
The short and sweet "it is not your fault" style is also good. We're somehow trying to fight the lengthy default Ansible output with something clear and helpful.

In this particular case we agree that things need to be really broken for the error condition to manifest! How about the concise message here, then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too late, already spent time expanding it.


def is_active(self):
etcd_hosts = self.get_var("groups", "etcd", default=[]) or self.get_var("groups", "masters", default=[]) or []
is_etcd_host = self.get_var("ansible_ssh_host") in etcd_hosts
return super(EtcdVolume, self).is_active() and is_etcd_host

def run(self):
mount_info = self._etcd_mount_info()
mount_info = self.find_ansible_mount(self.etcd_mount_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -12,7 +12,7 @@ class EtcdImageDataSize(OpenShiftCheck):
tags = ["etcd"]

def run(self):
etcd_mountpath = self._get_etcd_mountpath(self.get_var("ansible_mounts"))
etcd_mountpath = self.find_ansible_mount("/var/lib/etcd")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

I see this same path in two different checks :)

@@ -15,7 +16,7 @@ def test_cannot_determine_available_disk(ansible_mounts, extra_words):
with pytest.raises(OpenShiftCheckException) as excinfo:
EtcdVolume(fake_execute_module, task_vars).run()

for word in 'Unable to find etcd storage mount point'.split() + extra_words:
for word in ['Unable to determine mount point'] + extra_words:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for the other test file, note the difference word-by-word x whole sentence. If that's understood, no problem with the change.

@sosiouxme
Copy link
Member Author

aos-ci-test

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 80ad75a (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 80ad75a (logs)

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 7, 2017
@sosiouxme
Copy link
Member Author

aos-ci-test

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 969447f (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 969447f (logs)

@sosiouxme
Copy link
Member Author

bot, retest this

@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 7, 2017
@sosiouxme
Copy link
Member Author

bot, restest this please

@rhcarvalho
Copy link
Contributor

bot, retest this please

1 similar comment
@sosiouxme
Copy link
Member Author

bot, retest this please

@sosiouxme
Copy link
Member Author

@rhcarvalho
Copy link
Contributor

@sosiouxme
Copy link
Member Author

bot, retest this please

@rhcarvalho
Copy link
Contributor

[test]

@openshift-bot openshift-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2017
Reuse the code for finding the ansible_mounts mount for a path.
@sosiouxme
Copy link
Member Author

aos-ci-test

@openshift-bot
Copy link

Evaluated for openshift ansible test up to 3c71d00

@openshift-bot openshift-bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 8, 2017
@openshift-bot
Copy link

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pull_request_openshift_ansible/423/) (Base Commit: 566731d) (PR Branch Commit: 3c71d00)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 3c71d00 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 3c71d00 (logs)

@openshift-bot
Copy link

error: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 3c71d00 (logs)

@openshift-bot
Copy link

error: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 3c71d00 (logs)

@sosiouxme
Copy link
Member Author

sosiouxme commented Aug 9, 2017

^^^ schroedinger tests ... they succeeded and failed at the same time. Until the bot looked at them, then they collapsed to failed state.

Really don't know what to make of that. The logs look like nothing went wrong.

@sosiouxme
Copy link
Member Author

aos-ci-test

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_NOT_containerized, aos-ci-jenkins/OS_3.6_NOT_containerized_e2e_tests" for 3c71d00 (logs)

@openshift-bot
Copy link

success: "aos-ci-jenkins/OS_3.6_containerized, aos-ci-jenkins/OS_3.6_containerized_e2e_tests" for 3c71d00 (logs)

@sosiouxme
Copy link
Member Author

[merge]

@sosiouxme
Copy link
Member Author

yum flakes openshift/origin#10162 openshift/origin#8571
epel mirror is borked?

i imagine the tests aren't worth much in that state, can just wait until it's resolved.

@sosiouxme
Copy link
Member Author

[merge] again

@openshift-bot
Copy link

Evaluated for openshift ansible merge up to 3c71d00

@openshift-bot
Copy link

continuous-integration/openshift-jenkins/merge FAILURE (https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/834/) (Base Commit: 7c7b91c) (PR Branch Commit: 3c71d00)

@sosiouxme
Copy link
Member Author

https://ci.openshift.redhat.com/jenkins/job/merge_pull_request_openshift_ansible/834/ appears to have been Jenkins running out of space.

For example

FATAL: Unable to produce a script file
java.io.IOException: No space left on device
	at java.io.UnixFileSystem.createFileExclusively(Native Method)
	at java.io.File.createTempFile(File.java:2024)
	at hudson.FilePath$17.invoke(FilePath.java:1372)
Caused: java.io.IOException: Failed to create a temporary directory in /tmp
	at hudson.FilePath$17.invoke(FilePath.java:1374)
	at hudson.FilePath$17.invoke(FilePath.java:1362)
	at hudson.FilePath.act(FilePath.java:996)
	at hudson.FilePath.act(FilePath.java:974)
	at hudson.FilePath.createTextTempFile(FilePath.java:1362)
Caused: java.io.IOException: Failed to create a temp file on /var/lib/jenkins/jobs/test_pull_request_openshift_ansible_extended_conformance_install_with_status_check/workspace

So this test probably still didn't get very far. I don't think there's a flake issue for this one and don't really expect it to be a recurring thing... @sdodson would you mind either re-running the merge or just merging? The aos-ci-tests all passed long ago, just can't seem to get Rosie on board.

@sdodson sdodson merged commit 8daf3d9 into openshift:master Aug 11, 2017
@sosiouxme sosiouxme deleted the 20170728-refactor-ansible-mounts branch August 15, 2017 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants