Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Event loop is regularly being blocked for longish periods #3302

Closed
ulope opened this issue Jan 22, 2019 · 5 comments
Closed

Event loop is regularly being blocked for longish periods #3302

ulope opened this issue Jan 22, 2019 · 5 comments
Labels
State / Investigating For issues that are currently being looked into before labeling further State / Meta Meta issue, must be split into smaller issues

Comments

@ulope
Copy link
Collaborator

ulope commented Jan 22, 2019

Problem Definition

Using the gevent monitoring thread (enable by setting the env var GEVENT_MONITOR_THREAD_ENABLE=1) shows that we're regularly blocking the event loop for longish periods (the max allowed blocking time can be controlled via GEVENT_MAX_BLOCKING_TIME, I used 0.1 and 0.5 as an example).

The main blocking culprits seem to be:

  • sqlite
  • deepcopy calls inside the state machine

Possible solutions

For sqlite there is gsqlite3 which on paper sounds like a solution.

For the state machine we should look into using persistent / functional datastructures as a way to avoid having to use deepcopy. Pyrsistent looks promising.

@ulope ulope added State / Meta Meta issue, must be split into smaller issues State / Investigating For issues that are currently being looked into before labeling further labels Jan 22, 2019
@rakanalh
Copy link
Contributor

The main blocking culprits seem to be:

  • sqlite
  • deepcopy calls inside the state machine

Is there a way to construct a flame graph to support this?

@hackaugusto
Copy link
Contributor

@hackaugusto
Copy link
Contributor

Actually, I tried pyflame with py3.7 and py3.6 and it doesn't work with either.

For py3.7 the abi is not supported.
For py3.6 It seems to be a known bug uber-archive/pyflame#129 .
For py3.5 raiden is not compatible

I may reintroduce one of the tools we had for profilling

@ulope
Copy link
Collaborator Author

ulope commented Jan 24, 2019

Is there a way to construct a flame graph to support this?

Gevent already shows where the blocking occurs when running with the variables I mentioned above.
It just needs someone taking the time to look through the generated output.
To make this a bit easier the exception_stream variable can be set to a file for example so output isn't intermingled with the raiden logs.

For profiling there is also py-spy (which claims to support 3.3. - 3.7).
The gevent docs point to https://github.com/nylas/nylas-perftools

@hackaugusto
Copy link
Contributor

We have done a lot of work on this issue, since blocking the event loop results in presence problems with the matrix transport, which led to transfer failures because the nodes appeared to be offline.

There are still some longish periods of time were the loop may block, however, they are not a source of problem ATM. The next step should be to add benchmarks and create issues based on that. For now I'm closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
State / Investigating For issues that are currently being looked into before labeling further State / Meta Meta issue, must be split into smaller issues
Projects
None yet
Development

No branches or pull requests

3 participants