Moves writeback to outside of lock #274

marks-sortable · 2017-05-10T14:38:50Z

If all dispatchers are blocking on an invlock held by aggregator_expire, and sufficient metrics have been sent back into the system but not yet read, the system can deadlock with aggregator_expire blocking on the socket write, while the threads that could free space in the FIFO are blocking on aggregator_expire.

While aggregator_expire holds the lock, don't write metrics but instead accumulate them to a NUL-separated buffer. After releasing the lock, write the metrics to the fd one at a time so that we can properly track sent and dropped metrics.

Refactors the metric writing code of `aggregator_expire`. While we hold the lock, don't write metrics but instead accumulate them to a nul-separated buffer. After releasing the lock, write the metrics to the fd one at a time so that we can properly track `sent` and `dropped` metrics.

grobian · 2017-06-10T12:46:30Z

The write was never supposed to block. It seems I forgot about this for the pipe.

This is an alternative to PR #274, where instead of adding another queue (that can explode) all locks are released before issuing the write. This way, should the dispatchers be busy (and the write block) further aggregation work can continue. While at it, some more locks were introduced to keep the critical sections low. Most runs under a read-lock now, and the expiry thread does its modifications in one go after it produced all aggregates. With this locking, it should be possible to run more than 1 expiry thread.

grobian · 2017-07-19T18:02:37Z

I don't like the buffering approach because it opens up another possibility to blow up memory-wise, so I changed the aggregator instead not to hold the lock file writing, and avoid exclusive locking more than it did before. I think this should fix your deadlock. Any feedback would be much appreciated.

grobian mentioned this pull request May 19, 2017

carbon-c-relay hanging with no indication of an issue #250

Closed

grobian mentioned this pull request Jun 19, 2017

3.1 produces lots of close_wait sockets with unread bytes in their receive queue #281

Closed

grobian closed this Jul 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Moves writeback to outside of lock #274

Moves writeback to outside of lock #274

marks-sortable commented May 10, 2017

grobian commented Jun 10, 2017

grobian commented Jul 19, 2017

Moves writeback to outside of lock #274

Moves writeback to outside of lock #274

Conversation

marks-sortable commented May 10, 2017

grobian commented Jun 10, 2017

grobian commented Jul 19, 2017