Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moves writeback to outside of lock #274

Closed
wants to merge 1 commit into from

Conversation

marks-sortable
Copy link
Contributor

If all dispatchers are blocking on an invlock held by aggregator_expire, and sufficient metrics have been sent back into the system but not yet read, the system can deadlock with aggregator_expire blocking on the socket write, while the threads that could free space in the FIFO are blocking on aggregator_expire.

While aggregator_expire holds the lock, don't write metrics but instead accumulate them to a NUL-separated buffer. After releasing the lock, write the metrics to the fd one at a time so that we can properly track sent and dropped metrics.

Refactors the metric writing code of `aggregator_expire`. While we
hold the lock, don't write metrics but instead accumulate them to
a nul-separated buffer. After releasing the lock, write the metrics
to the fd one at a time so that we can properly track `sent` and
`dropped` metrics.
@grobian
Copy link
Owner

grobian commented Jun 10, 2017

The write was never supposed to block. It seems I forgot about this for the pipe.

grobian added a commit that referenced this pull request Jul 19, 2017
This is an alternative to PR #274, where instead of adding another queue
(that can explode) all locks are released before issuing the write.
This way, should the dispatchers be busy (and the write block) further
aggregation work can continue.
While at it, some more locks were introduced to keep the critical
sections low.  Most runs under a read-lock now, and the expiry thread
does its modifications in one go after it produced all aggregates.
With this locking, it should be possible to run more than 1 expiry
thread.
@grobian
Copy link
Owner

grobian commented Jul 19, 2017

I don't like the buffering approach because it opens up another possibility to blow up memory-wise, so I changed the aggregator instead not to hold the lock file writing, and avoid exclusive locking more than it did before. I think this should fix your deadlock. Any feedback would be much appreciated.

@grobian grobian closed this Jul 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants