Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer times got twice longer between [email protected] and [email protected] #1342

Closed
alxroyer opened this issue Aug 11, 2022 · 14 comments
Assignees
Labels
kind/bug A bug in existing code (including security flaws) P2 Medium: Good to have, but can wait until someone steps up

Comments

@alxroyer
Copy link

alxroyer commented Aug 11, 2022

Severity:

Medium (performance issue)

Description:

According to my experimentations, transfer times got twice longer between [email protected] and [email protected].

If I am true:

  • Is the reason known for it? (additional features? refactorings? ...)
  • What is the priority for the libp2p team to address this kind of issue?

Steps to reproduce the error:

See the following test scripts (one for each version of js-libp2p):

Those scripts make it possible to launch 2 nodes:

  • a sender, with the following options:
    • --send: path to a file which data to send,
    • --size: size of the chunks for sending the data.
  • a receiver, with the following options:
    • --connect option set with the address printed out by the sender.

Timestamps are printed out when the sender starts sending the data, and when the receiver has finished receiving all of it.

With a bit of scripting around it, I made up the following graphs:

p2p901-2022-08-08a
Note: Chunks of 100 bytes are not representative in as much as it causes a lot of overhead around the useful data actually transfered (see #1343).

p2p901-2022-08-08b

p2p901-2022-08-08c

As the graphs show it, [email protected] transfer times are about twice those of [email protected] from chunk sizes of 1Kb.

@alxroyer alxroyer added the need/triage Needs initial labeling and prioritization label Aug 11, 2022
@alxroyer
Copy link
Author

alxroyer commented Aug 11, 2022

Possibly related to #1303, but not sure #1303 explains all the performance problems.

@wemeetagain
Copy link
Member

@Alexis-ROYER having not looked at your case specifically, so I can't say 100% that the problem is #1303 but its very likely most of it is.
Other notable regression is protobuf encoders/decoders.

We're actively working on resolving the performance issues, and most of the work has landed in our nightly version. This is our current focus (along with stabilizing the types) before pushing out a v0.38.0.

@mpetrunic
Copy link
Member

@Alexis-ROYER Could you try with the latest 0.38 version?

@mpetrunic mpetrunic added need/author-input Needs input from the original author and removed need/triage Needs initial labeling and prioritization labels Aug 30, 2022
@achingbrain
Copy link
Member

Closing as the issue should be resolved - please re-open if you continue to observe it.

@alxroyer
Copy link
Author

alxroyer commented Sep 9, 2022

I've updated my measures with [email protected] and [email protected]:
p2p901-2022-09-09b
p2p901-2022-09-09c

Transfer times are still about those of [email protected], higher than [email protected].

The following test script works for both [email protected] and [email protected]:
libp2p-node.0.38.x.mjs

@alxroyer
Copy link
Author

alxroyer commented Sep 9, 2022

@achingbrain Could you please re-open the issue? I can't on my side.

@wemeetagain wemeetagain reopened this Sep 10, 2022
@achingbrain
Copy link
Member

I've put a test bed together here: https://github.com/ipfs-shipyard/js-libp2p-transfer-performance

It takes a slightly different approach to the two files in the OP - notably it doesn't configure a DHT since it's not needed and uses the plaintext connection encrypter instead of noise as it's simpler.

It transfers 100MiB of data between the two nodes using increasing chunk sizes, recording how long it takes.

Since the thing doing all the work here is the stream multiplexer, it tests 0.36.x, 0.37.x, 0.38.x and 0.39.x with mplex and also 0.39.x with the brand-new yamux multiplexer. 0.38.x and 0.39.x use the same version of @libp2p/mplex so you'd expect their speeds to be comparable.

What I've found largely tallies with @Alexis-ROYER's investigation:

image

In the graph above I've ignored message sizes below 256b as I don't think many protocols would send messages so small but the values follow the general pattern.

0.36.x is definitely faster with small message sizes but that gets less pronounced as they increase in size - performance starts to coalesce at message sizes around 32kb.

We should look at the performance overhead for messages smaller than this to try to bring the gap down between the blue and green lines in the graph above.

Other takeaways are that if you're using 0.37.x you should upgrade to 0.38.x or later and yamux can probably be optimised somewhat (not a criticism, it's still very new).

@mpetrunic
Copy link
Member

I will run some.profilling tools today to see what the hot paths are

@BigLep BigLep removed the need/author-input Needs input from the original author label Sep 13, 2022
@BigLep BigLep added the P2 Medium: Good to have, but can wait until someone steps up label Sep 13, 2022
@BigLep
Copy link
Contributor

BigLep commented Sep 13, 2022

2022-09-13 converation: @mpetrunic has done some profiling but libp2p dependencies aren't showing up with clinic.

@Alexis-ROYER: thanks for reporting this as clearly there is an issue. We're curious how you noticed this and if this is having an impact on any applications/uses you have with js-libp2p. This will help us gage where to prioritize this work. And certainly any help digging into the issue is welcome.

@BigLep BigLep added the need/author-input Needs input from the original author label Sep 13, 2022
@alxroyer
Copy link
Author

@BigLep My pleasure.

I was actually investigating on performance issues of an app I'm working on.
Among others, I'm tunnelling other protocols inside a libp2p connection between two peers.
Since it's a tunnelling process, chunks of data are those of the embedded protocols, which are about 5-10kB.

With some big data to transfer, things got stuck in while with [email protected].
When I upgraded with [email protected], things seemed to get more stable, but still slow.
In a dichotomic approach I started with qualifying transfer rate capabilities of libp2p out of my own code, which led me to report this issue.

achingbrain pushed a commit to libp2p/js-libp2p-mplex that referenced this issue Sep 20, 2022
Related to libp2p/js-libp2p#1342
Shaves ~100ms when sending small chunks but we are still not close to 0.36 speed
@github-actions
Copy link
Contributor

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 7 days.

@BigLep BigLep removed need/author-input Needs input from the original author kind/stale labels Sep 23, 2022
@BigLep
Copy link
Contributor

BigLep commented Sep 23, 2022

I removed the relevant labels so this doesn't get auto closed.

@BigLep BigLep added the need/author-input Needs input from the original author label Sep 23, 2022
@achingbrain
Copy link
Member

achingbrain commented Sep 23, 2022

What I have found is that between 0.36.x and 0.37.x we have started sending more chunks of data than before, and the chunks are a lot smaller.

In mplex we yield every encoded Uint8Array - in the current release that's one for the header and one (or more) for the data, the theory being if you pump data into the socket opened by @libp2p/tcp as quickly as possible, the OS will figure out the best way to send the data. Turns out it's not quite that clever.

I have refactored this code locally to only ever yield one Uint8Array and the time taken to send 105MB of data in 256b chunks decreases from about 5.5s to about 3s, which is good but 0.36.x is just under 2s so there's still a way to go.

Digging further, 0.36.x sends chunks of data that are ~5xx bytes in one go - this is because the array of messages that are encoded always has 2x members so they are encoded and transmitted at once.

0.37.x+ only ever has 1x member in the same array at the same point - I have a local refactor that concatenates all the buffers together in the same way as 0.36.x but it doesn't make much difference because there's still only ever one message.

The messages are added to this pushableV but only ever one at at time.

I've yet to ascertain why the older version yields 2x messages in one go - in theory the pushable will buffer input if it's not being consumed quickly enough - indeed adding an artificial delay to the newer code in the loop can cause multiple messages to be yielded which does increase performance a bit more but it's not much of a solution.

Once I figure out why the messages don't stack in recent releases the same way they do in 0.36.x it opens up more possibilities - what if we send more than 2x messages at once, could it be even faster?

I need to stop now but I'll keep digging on Monday.

@p-shahi p-shahi added kind/bug A bug in existing code (including security flaws) and removed need/author-input Needs input from the original author labels Sep 27, 2022
@mpetrunic mpetrunic assigned achingbrain and unassigned mpetrunic Nov 21, 2022
achingbrain added a commit that referenced this issue Nov 23, 2022
Instead of using `it-pipe` to tie the inputs and outputs of the muxer
and underlying connection together, pipe them in parallel.

When sending 105MB in 32b chunks:

```
testing 0.36.x
sender 3276810 messages 1638409 invocations  <-- how many mplex messages are sent in how many batches
sender 1638412 bufs 68 b                     <-- how many buffers are passed to the tcp socket and their average size
105 MB in 32 B chunks in 9238ms
```

```
testing 0.40.x-mplex
sender 3276811 messages 3276808 invocations
sender 3276811 bufs 34 b
105 MB in 32 B chunks in 15963ms
```

```
testing 0.40.x-mplex
sender 3276811 messages 1638408 invocations
1638411 bufs 68 b
105 MB in 32 B chunks in 8611ms
```

Fixes #1342
@achingbrain
Copy link
Member

Good news, with #1491 and libp2p/js-libp2p-mplex#233 both applied streaming performance is fixed, not only that but it's now faster than [email protected] with smaller message sizes for both mplex and yamux. This will be all be in the next release.

image

achingbrain added a commit that referenced this issue Nov 24, 2022
Instead of using `it-pipe` to tie the inputs and outputs of the muxer and underlying connection together, pipe them in parallel.

When sending 105MB in 32b chunks:

## `[email protected]`

```
testing 0.36.x
sender 3276810 messages 1638409 invocations  <-- how many mplex messages are sent in how many batches
sender 1638412 bufs 68 b                     <-- how many buffers are passed to the tcp socket and their average size
105 MB in 32 B chunks in 9238ms
```

## `[email protected]`

```
testing 0.40.x-mplex
sender 3276811 messages 32 invocations
sender 6553636 bufs 17 b 27476 ms
105 MB in 32 B chunks in 27450ms
```

## With this patch

```
testing 0.40.x-mplex
sender 3276811 messages 17 invocations
sender 6553636 bufs 17 b 23781 ms
105 MB in 32 B chunks in 23753ms
```

## With this patch and libp2p/js-libp2p-mplex#233

```
testing 0.40.x
sender 3276811 messages 1638408 invocations
1638411 bufs 68 b
105 MB in 32 B chunks in 8611ms
```

Refs #1342
achingbrain added a commit to achingbrain/uint8arraylist that referenced this issue Nov 24, 2022
Related to: libp2p/js-libp2p#1342 but not showing a lot of ms there 😅 
Benchmarking against the current master shows a big improvement

Co-authored-by: Alex Potsides <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws) P2 Medium: Good to have, but can wait until someone steps up
Projects
Archived in project
Development

No branches or pull requests

6 participants