Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add autonat v2 spec #538
add autonat v2 spec #538
Changes from 23 commits
d663611
1db8613
0ff8ac6
f2a431c
62123df
0771bab
3e57202
f6def9a
b769d79
e4efaae
f28511c
d4da279
5b8d37d
05a0de2
8b52643
dd2750c
4e6ecaa
2af3309
6b1604b
f979fac
209b215
094089b
b4a856b
1c76613
03718ef
0195203
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have a message for this too, otherwise implementing the decoder for these messages is going to be quite annoying because you need to attempt deserializing them and only if that fails, consider it bare "data".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the protocol is if the client accepts the DialDataRequest, it starts sending that amount of data. If the client doesn't accept the DialDataRequest, it resets the stream. You do not need to deserialise anything after you send DialDataRequest here. Just read that many bytes.
Wrapping the entire data in a protobuf will make the protobuf size very large.
Do you think adding a
DataRequestAccepted
message before sending that amount of data as raw bytes will make things cleaner?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I am asking for this is because at least in
rust-libp2p
, we typically implement "codecs" which are modules that turn byte streams into typed streams (and sinks) of messages. Reading bare half-way through does not play that well with strong type systems.What do you mean it will make the protobuf large? Can't you just have a dynamically sized byte buffer in the protobuf? What is the difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we could unwrap the stream again and read a specified number of bytes.
Thoughts @mxinden?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to bring the whole serialised protobuf into memory before decoding. This will take memory of the order of 100kb. Not prohibitive, but I'd like to avoid this if we can.
If I understand correctly, you would prefer it to be a protobuf so you can have a typed stream which says first object received from the stream is of type A and the second object received is of type B, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That might work, but you're now hardcoding assumptions about the wire format of the Protobuf, outside of your Protobuf library. I'd like to avoid that.
Not sure I agree with this framing, but in the end, this boils down to a matter of taste.
How about repeatedly sending a smaller message (limited by our usual Protobuf size), until we've reached the required number of bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in the current proposal because the number of bytes that you need to read in the second message is not fixed and is equal to the number you asked for in the
DialDataRequest
this prevents you from writing a generic reader, right?If that's the case we can prefix the bytes sent with a unsigned-varint just like we prefix other protobuf messages sent. This will also be consistent with the other messages exchanged everywhere which I'd argue is the source of inconsistency in the design as opposed to bytes not being sent as a TLV object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After doing a POC implementation, I think Marten's suggestion here is the most suited to implementation. The problem from a go perspective is buffering in the protobuf msg reader on the server side. This can happen if someone does a pipelined implementation on the client side which sends data as soon as it has sent the request. see documentation for NewDelimitedReader for another case where this happens. While this situation is unlikely, it is simpler to just use protobufs for transferring data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that makes sense. Would that message be a regular protobuf with a static array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be "MUST NOT". and "The server MUST NOT dial any private address".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to implement these safely though. Both the client and the server need to check that the peer is connected over a private IP.
client 192.168.0.100 -> server 192.168.0.10
In this case it's reasonable for the client to ask the server to test its private IP reachability.
This is an edge case I'm willing to ignore though. Happy to change this to MUST, just that keeping it SHOULD allows some implementation to provide this feature if they're willing to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you know that you are indeed on the same private network?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not completely sure.
If you see local connection address in private IP range and remote connection address in private IP range, is that enough to conclude that you're in some private network?
Note you cannot rely on https://datatracker.ietf.org/doc/html/rfc1918 subnet masks as you can make a private network from a collection of smaller private networks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if the local address and remote address are in the same private subnet, then it would be okay.
How about adding "The server SHOULD NOT dial any private address"? This leaves the door open in the spec.
I'm not sure the usefulness of doing this though, but maybe others might have a use for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mxinden mention that this design is takes inspiration from a different protocol but I forgot which one. Might be worth mentioning!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. related to the QUIC anti-amplification mechanism based on data size:
https://www.rfc-editor.org/rfc/rfc9000.html#section-8.1
I am not aware of an exact origin of the AutoNATv2 mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QUIC is always a good source of inspiration :)
The specifics here is something that we came up with ourselves, in the discussion on the issue. If you're aware of any prior art (RFCs?), please let us know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I mention QUIC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can simply shrug this off. This is called a reflection attack and has been a huge issue for open DNS resolvers.
Fixing the amplification side does go a long way, but paying a 5x bandwidth cost for a bunch of free IP addresses seems like a pretty reasonable tradeoff from an attacker's standpoint (especially because said attacker isn't paying for the bandwidth, but likely needs to compromise one machine per IP address).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also note: home NAT users likely don't need this feature. That is:
Being willing to dial other addresses does matter for, e.g., AWS and other special settings where there are separate ingress IP addresses. But, in that case, maybe the user should just configure their node correctly rather than relying on AutoNAT? AutoNAT specifically exists to enable home users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It simplifies client implementations as they don't need to worry about IPv4 peer vs IPv6 peer. Though the benefit isn't huge since most IPv4 servers won't have IPv6 connectivity so they any way cannot check the IPv6 address.
Can you elaborate here? why isn't the attacker paying for the bandwidth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The stream timeout is 1 minute
* In the period, with a 1Gbps connection, you can send 60Gb ~= 6GB
* Maximum dial data requirement is 100kB
* So, in theory, you can run this with 60_000 peers in parallel.
* The servers have a random wait of up to 3 seconds precisely for this scenario. So in theory we can have 20k connections a second for 3 seconds to the target.
* See discussion around this comment: feat(autonatv2): Implement autonat v2 umgefahren/rust-libp2p#1 (comment)
@MarcoPolo @Stebalien @umgefahren thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The potential attack vector seems to be described correctly, however I'm not sure that rust-libp2p is affected. The whole process is allowed to take a maximum of 10 seconds:
https://github.com/libp2p/rust-libp2p/blob/8ceadaac5aec4b462463ef4082d6af577a3158b1/protocols/autonat/src/v2/server/handler/dial_request.rs#L66
However, we don't wait any time before dealing back. This mitigation is a quick fix, I can prepare.
Regarding IPv4 and IPv6 I stand with @thomaseizinger's comment on that matter: umgefahren/rust-libp2p#1 (comment)
So it correctly handles that case, in that we don't generate false positives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of an index, why not send the address itself? In my eyes just a small complexity reduction. Feel free to ignore. Based on intuition, the additional bytes (index < address) doesn't have a performance impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your suggestion is right. I'll resolve this while taking up the implementation. This will help us avoid sharing the slice between the thread that makes the dial request and the thread that receives the response. Similar effect can be had by sharing the idx to the requesting thread, but this seems simpler to me. I'll resolve this while implementing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure how to achieve this, but it would be nice if it could be made clear that this is happening on a new connection (and a different 5-tuple), whereas the rest of the exchange happens on the same stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mentions
E_TRANSPORT_NOT_SUPPORTED
but that is missing from the protobufs?