Ipfs Performance Tweaks #404

whyrusleeping · 2014-12-05T22:08:09Z

This PR will be the staging ground for ipfs performance fixes. It will be nice to have all of these changes in a single place so we can easily test performance of this branch against master.
My list of hopeful features is:

jbenet · 2014-12-05T22:47:22Z

A blockservice agent would act to serialize access to the datastore, avoiding the lockdeath weve seen from leveldb

Are the leveldb locks actually more expensive than what this agent would do? (either sync.Mutex or channels). How granular are the locks? full serialization is not what we want, we want to do as much as we possibly can in parallel.

make an agent for the muxers bandwidth accounting

What does this one mean?

Batch up bitswap wantlist sends

Expand too? This could mean a few different approaches, and want to make sure we're on the same page.

msgio fixes:

sink Pool into Reader (no max size, preallocated buffers issue)
fix func (rw *ReadWriter_) Close()
r&w threadsafety

take a look at levelDB's performance

vague.

profile for number of active goroutines at any given time

+1

whyrusleeping · 2014-12-05T23:23:04Z

My goal with the blockstore agent isnt so much to serialize the disk writes, but to avoid the goroutine buildup, at one point i had 3000 goroutines trying to write to the datastore.

jbenet · 2014-12-05T23:40:39Z

My goal with the blockstore agent isnt so much to serialize the disk writes, but to avoid the goroutine buildup, at one point i had 3000 goroutines trying to write to the datastore.

What do you have in mind? the goroutines are created by clients. we need to make sure the writes complete (consistency). not sure we're talking about the same things

whyrusleeping · 2014-12-05T23:45:36Z

Hrm... thinking about it more, maybe that wouldnt help anything

btc · 2014-12-06T01:11:48Z

Before we begin a large performance effort, what specific concrete performance issues are we currently facing?

We may want to consider looking at network reliability (ie. reachability, message delivery) before setting out to increase throughput and reduce latency.

send wanted blocks in order, N blocks at a time, this will make streaming more likely to happen

This has a non-trivial adverse effect on latency.
The local node has no actual control over the behavior of the network. With the proposed behavior, the contract provided to the client is no stronger than the current behavior.

Would this be more reliably addressed by re-ordering the blocks locally as a stage in the response pipeline?

My goal with the blockstore agent isnt so much to serialize the disk writes, but to avoid the goroutine buildup, at one point i had 3000 goroutines trying to write to the datastore.

Async disk writes, as @jbenet mentioned, inhibit consistency.

The root cause of this was overzealous goroutine generation in the network layer.

Don't we just need to apply backpressure when producers are working faster than consumers?

take a look at levelDB's performance

We recently addressed LevelDB's write compaction issues. Afaik, this is no longer a bottleneck. Has something changed?

whyrusleeping · 2014-12-06T01:23:30Z

We are currently completely unable to transfer a file larger than around 50MB due to quickly running out of memory.

re: Bitswap Request batching, if done properly, it should significantly improve latency. Currently, no order is implied in the requesting of a block, so the time until you have usable data is very unpredictable, in my opinion this is very bad. If your bandwidth is at most X blocks per second it wont make a difference to the overall speed of the operation if you only request X blocks at a time, i would prefer to receive blocks towards the front of the file first so we can stream, as opposed to random blocks throughout the file.

As for the levelDB issues, i dont beleive that was ever actually addressed, i think we just stopped hitting it so hard. I think what im seeing is network speeds greater than levelDB write speeds causing a chokepoint

btc · 2014-12-06T01:37:03Z

As for the levelDB issues, i dont beleive that was ever actually addressed,

Yeah they were. The solution ended up being pretty simple. The culprit was unnecessary snappy compression during write compaction.

ff490a6

I think what im seeing is network speeds greater than levelDB write speeds causing a chokepoint

It's that producers on the network side are spewing goroutined requests up to bitswap faster than bitswap can retrieve off of the disk. This is the fault of the network layer for not responding to back-pressure.

whyrusleeping · 2014-12-06T01:48:24Z

How was the compaction issue addressed?

btc · 2014-12-06T01:55:48Z

How was the compaction issue addressed?

Compaction was a red herring. Compression made compaction take too long and when compaction took too long, writes backed up.

We blamed it on compaction because it was an easy target and the internet seemed to back up that hypothesis, but the root cause was expensive compression during compaction.

We are currently completely unable to transfer a file larger than around 50MB due to quickly running out of memory.

What's our best guess as to why this is happening?

btc · 2014-12-06T02:01:53Z

no order is implied in the requesting of a block

keys at the front/head of the wantlist are deemed higher priority. priority is expressed through the order of the wantlist.

whyrusleeping · 2014-12-06T02:17:21Z

So, how was the "not compaction" issue fixed?

On Fri, Dec 5, 2014, 6:01 PM Brian Tiger Chow [email protected]
wrote:

no order is implied in the requesting of a block

keys at the front/head of the wantlist are deemed higher priority

—
Reply to this email directly or view it on GitHub
#404 (comment).

btc · 2014-12-06T02:19:59Z

So, how was the "not compaction" issue fixed?

I linked to it in a previous comment. Didn't realize you didn't see the link. ff490a6

whyrusleeping · 2014-12-06T02:21:27Z

Ah, totally missed that that link was the fix. As for the wantlist being
high priority at the front, our wantlist is a map, which has random order.

On Fri, Dec 5, 2014, 6:17 PM Jeromy Johnson [email protected] wrote:

So, how was the "not compaction" issue fixed?

On Fri, Dec 5, 2014, 6:01 PM Brian Tiger Chow [email protected]
wrote:

no order is implied in the requesting of a block

keys at the front/head of the wantlist are deemed higher priority

—
Reply to this email directly or view it on GitHub
#404 (comment).

btc · 2014-12-06T02:39:06Z

The wantlist should be a hybrid that uses both a slice and a map. The map to prevent duplicates. A slice to preserve ordering. This is done in the BitSwapMessage:

https://github.com/jbenet/go-ipfs/blob/master/exchange/bitswap/message/message.go#L46

our wantlist is a map, which has random order.

oops. this is a bug.

jbenet · 2014-12-06T03:33:30Z

@whyrusleeping wrote

We are currently completely unable to transfer a file larger than around 50MB due to quickly running out of memory.

Is this recent? I've done that many times before. It's 1GB+ that has been failing for me.

@maybebtc wrote

This is the fault of the network layer for not responding to back-pressure.

This is an important issue. It was just addressed, though, yeah?

FWIW, i think we all agree here that we want to make sure to measure things correctly and outline the perf issues before optimizing blindly. I think @whyrusleeping has good instincts he's following, and @maybebtc is right to want measurements before diving into something. Particularly since we have very little time to release this (I'll write at length about this tomorrow, but we're shipping this before end of the year). Let's make sure we have concrete measured bottlenecks before jumping into things (like memory usage, and concurrency issues).

whyrusleeping · 2014-12-06T04:45:11Z

@maybebtc for the wantlist, we should probably use that same wantlist object for the wantlists in ledger.

@jbenet Ive been able to get a file around 50MB through over WAN, but above 100MB things fail. I think most of the failures will be fixed with the msgio changes, but im not super confident in that yet. We will see once that lands

btc · 2014-12-06T04:48:38Z

@maybebtc for the wantlist, we should probably use that same wantlist object for the wantlists in ledger.

Indeed. ~~low-pri~~ yeah

@jbenet Ive been able to get a file around 50MB through over WAN, but above 100MB things fail. I think most of the failures will be fixed with the msgio changes, but im not super confident in that yet. We will see once that lands

A little over a month ago, I transferred 1.5GB to cryptix. This is an interesting regression. We may want to create an integration test that performs a large file transfer.

whyrusleeping · 2014-12-07T01:32:32Z

Closing, concerns have been mostly addressed, this PR no longer needed.

whyrusleeping added the status/in-progress In progress label Dec 5, 2014

handleOutgoingMessage inline

0b714ea

whyrusleeping force-pushed the perf/ipfs branch from 194cd5b to 0b714ea Compare December 6, 2014 00:28

jbenet self-assigned this Dec 6, 2014

jbenet mentioned this pull request Dec 6, 2014

updated msgio #406

Merged

whyrusleeping closed this Dec 7, 2014

whyrusleeping removed the status/in-progress In progress label Dec 7, 2014

Kubuxu deleted the perf/ipfs branch February 27, 2017 20:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ipfs Performance Tweaks #404

Ipfs Performance Tweaks #404

whyrusleeping commented Dec 5, 2014

jbenet commented Dec 5, 2014

whyrusleeping commented Dec 5, 2014

jbenet commented Dec 5, 2014

whyrusleeping commented Dec 5, 2014

btc commented Dec 6, 2014

whyrusleeping commented Dec 6, 2014

btc commented Dec 6, 2014

whyrusleeping commented Dec 6, 2014

btc commented Dec 6, 2014

btc commented Dec 6, 2014

whyrusleeping commented Dec 6, 2014

btc commented Dec 6, 2014

whyrusleeping commented Dec 6, 2014

btc commented Dec 6, 2014

jbenet commented Dec 6, 2014

whyrusleeping commented Dec 6, 2014

btc commented Dec 6, 2014

whyrusleeping commented Dec 7, 2014

Ipfs Performance Tweaks #404

Ipfs Performance Tweaks #404

Conversation

whyrusleeping commented Dec 5, 2014

jbenet commented Dec 5, 2014

whyrusleeping commented Dec 5, 2014

jbenet commented Dec 5, 2014

whyrusleeping commented Dec 5, 2014

btc commented Dec 6, 2014

whyrusleeping commented Dec 6, 2014

btc commented Dec 6, 2014

whyrusleeping commented Dec 6, 2014

btc commented Dec 6, 2014

btc commented Dec 6, 2014

whyrusleeping commented Dec 6, 2014

btc commented Dec 6, 2014

whyrusleeping commented Dec 6, 2014

btc commented Dec 6, 2014

jbenet commented Dec 6, 2014

whyrusleeping commented Dec 6, 2014

btc commented Dec 6, 2014

whyrusleeping commented Dec 7, 2014