Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing issue #1961 with incorrect "All users spawned" log messages wh… #1977

Merged

Conversation

EzR1d3r
Copy link
Contributor

@EzR1d3r EzR1d3r commented Jan 23, 2022

Issue #1961 I saw this bug on a remote machine. I can't debug there, but I think the reason is that 100ms sometimes is not enough for workers (I guess, the last report from last worker) to send reports (or may be to receive and handle on master side). PR requires polish, but you can see the essence.

@@ -747,32 +747,33 @@ def start(self, user_count: int, spawn_rate: float, wait=False) -> None:
# when the user count is really at the desired value.
timeout = gevent.Timeout(self._wait_for_workers_report_after_ramp_up())
timeout.start()
msg_prefix = "All users spawned"
try:
while self.user_count != self.target_user_count:
gevent.sleep()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to change this to gevent.sleep(0.01) or something, to prevent busy waiting. What do you think @mboutet ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

@cyberw
Copy link
Collaborator

cyberw commented Jan 24, 2022

Would this delay the logging (as well as spawning_complete.fire) by a whole second? That is a bit unfortunate, but maybe unavoidable..

@EzR1d3r
Copy link
Contributor Author

EzR1d3r commented Jan 24, 2022

Would this delay the logging (as well as spawning_complete.fire) by a whole second? That is a bit unfortunate, but maybe unavoidable..

The delay for a second will only be if the worker does not make a report on time. In most cases 100 ms is enough. But unfortunately, it is not known how much it takes when a delay occurs, whether the worker reports at all.

We might want to change this to gevent.sleep(0.01) or something, to prevent busy waiting.

Do you want to sleep(0.01) to reduce the CPU load while we wait?

@cyberw
Copy link
Collaborator

cyberw commented Jan 24, 2022

Would this delay the logging (as well as spawning_complete.fire) by a whole second? That is a bit unfortunate, but maybe unavoidable..

The delay for a second will only be if the worker does not make a report on time. In most cases 100 ms is enough. But unfortunately, it is not known how much it takes when a delay occurs, whether the worker reports at all.

Cool, then it is a minor issue.

We might want to change this to gevent.sleep(0.01) or something, to prevent busy waiting.

Do you want to sleep(0.01) to reduce the CPU load while we wait?

Yes please.

If you do that, and resolve the conflicts, I'd be happy to merge.

@EzR1d3r
Copy link
Contributor Author

EzR1d3r commented Jan 24, 2022

Of course, I'll add edits. I think the report troubles message will not be superfluous anyway. However, if the bug is repeated again, a deeper analysis will be needed, why the last report sometimes is missed.

@EzR1d3r EzR1d3r force-pushed the issue-1961-all-users-spawned-message-bug branch 3 times, most recently from db48725 to 7bfe320 Compare January 24, 2022 22:01
@EzR1d3r EzR1d3r marked this pull request as draft January 24, 2022 23:08
@EzR1d3r EzR1d3r force-pushed the issue-1961-all-users-spawned-message-bug branch 2 times, most recently from 88ab6f9 to aa256cf Compare January 26, 2022 21:21
@EzR1d3r
Copy link
Contributor Author

EzR1d3r commented Jan 26, 2022

@cyberw @mboutet
I'm not very familiar with gevent and I found a strange thing. In this place sleep takes 100ms instead of wanted 1 second.

gevent.sleep(HEARTBEAT_INTERVAL)

So the test falls because master miss all workers (for ~300 ms instead of 3 second HEARTBEAT_LIVENESS) in heartbeat_worker while he wait reports for 1 second (was 0.1 second before).

def test_last_worker_missing_stops_test(self):

@cyberw
Copy link
Collaborator

cyberw commented Jan 26, 2022

Hmm.. I am pretty sure there is no way for a sleep to take a shorter amount of time than specified...

@mboutet
Copy link
Contributor

mboutet commented Jan 26, 2022

Hmm.. I am pretty sure there is no way for a sleep to take a shorter amount of time than specified...

Yes, that's certainly not possible. It could take longer if the event loop is busy, but not less.

@EzR1d3r
Copy link
Contributor Author

EzR1d3r commented Jan 27, 2022

This is possible if several lines above make a patch. ))) Didn't guess to check it out yesterday.
I rightly understand - is this just to speed up the test?

@mock.patch("locust.runners.HEARTBEAT_INTERVAL", new=0.1)

@cyberw
Copy link
Collaborator

cyberw commented Jan 27, 2022

Yup!

@EzR1d3r EzR1d3r force-pushed the issue-1961-all-users-spawned-message-bug branch 2 times, most recently from f53c3b2 to 8e1e461 Compare January 27, 2022 22:46
…sages when running distributed by increasing wait time.

Now if the timeout of worker reports is expired, log the message about it.
Also mock LOCUST_WAIT_FOR_WORKERS_REPORT_AFTER_RAMP_UP with old timeout in a test,
in another case would have to increase sleep timers in the test.
@EzR1d3r EzR1d3r force-pushed the issue-1961-all-users-spawned-message-bug branch from 8e1e461 to 9fb91d4 Compare January 27, 2022 23:02
@EzR1d3r EzR1d3r marked this pull request as ready for review January 27, 2022 23:19
@cyberw
Copy link
Collaborator

cyberw commented Jan 28, 2022

Awesome. Thanks!

@cyberw cyberw merged commit 48cee13 into locustio:master Jan 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants