Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add autonat v2 spec #538

Open
wants to merge 25 commits into
base: autonat-rename
Choose a base branch
from
Open

add autonat v2 spec #538

wants to merge 25 commits into from

Conversation

sukunrt
Copy link
Member

@sukunrt sukunrt commented Apr 11, 2023

First draft for autonat v2. #503

This protocol allows for testing reachability on exactly one address. This helps determine reachability at an address level. This also simplifies the protocol a lot.

I'll change the spec to reflect the discussion on dialing a different ip address from the nodes observed ip address: #536

Discussion for nonce in message is here: libp2p/go-libp2p#1480
and this comment in particular libp2p/go-libp2p#1480 (comment)

@sukunrt sukunrt marked this pull request as ready for review April 11, 2023 12:05
Copy link
Contributor

@marten-seemann marten-seemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, this is a solid starting point for the spec!

What's the plan for resolving #536? Would you open a new PR that targets this PR here?

autonat/README.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
@sukunrt
Copy link
Member Author

sukunrt commented Apr 11, 2023

What's the plan for resolving #536? Would you open a new PR that targets this PR here?

Yes, I'll open a PR with the changes for #536.

@sukunrt sukunrt marked this pull request as draft April 11, 2023 16:51
@sukunrt sukunrt changed the base branch from master to autonat-rename April 12, 2023 08:29
autonat/autonat-v2.md Outdated Show resolved Hide resolved
Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting! Thanks for your work. Left some comments/questions :)

Sorry if they have already been answered somewhere.

autonat/autonat-v2.md Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
@sukunrt
Copy link
Member Author

sukunrt commented Apr 25, 2023

thanks for your review @thomaseizinger.
I'd like your opinion on these two issues

Proposal: use a list of addresses in priority order for autonat v2 dial requests #539
Proposal: allow AutoNAT to dial all IP addresses, without risking amplification attacks #536

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @sukunrt. Thank you!

autonat/autonat-v2.md Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor

thanks for your review @thomaseizinger. I'd like your opinion on these two issues

Proposal: use a list of addresses in priority order for autonat v2 dial requests #539
Proposal: allow AutoNAT to dial all IP addresses, without risking amplification attacks #536

I don't have anything to add to those at the moment :)

Copy link
Contributor

@MarcoPolo MarcoPolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a brief skim, this looks good! I'm curious if we'll want to relax the "implementations MUST NOT dial any multiaddress unless it is based on the IP address the requesting node is observed as". Would it be useful to do this, and we can mitigate the amplification attack some other way?

It seems like there's a healthy discussion already going on, so I'll step back here and let other folks stay involved. If there's anything I can help with, please don't hesitate to ping.

autonat/autonat-v2.md Outdated Show resolved Hide resolved
autonat/autonat-v2.md Outdated Show resolved Hide resolved
@sukunrt
Copy link
Member Author

sukunrt commented Apr 27, 2023

Thanks for your review @MarcoPolo

It seems like there's a healthy discussion already going on, so I'll step back here and let other folks stay involved. If there's anything I can help with, please don't hesitate to ping.

The suggested strategy is discussed here: #536
Please check if we've made any errors there or overlooked something.

Here's the PR for those changes: #542
You can review it there, or here after I merge those changes.

@umgefahren
Copy link

Another quick question, I probably missed something: When the server successfully dials the client and provides the nonce. The client closes the stream either way. How does the server know if the provided nonce was correct?

@umgefahren
Copy link

In the spec it says that all private IP address should be excluded, but it also says it's just for checking reachability on the public internet. That said, we should exclude:

For IPv4:

  • addresses on "this network" (i.e. IPv4 address starting with a 0)
  • private IP addresses
  • shared IP addresses
  • loopback IP addresses (could create interesting behavior though)
  • link local IP addresses
  • reserved for future protocols
  • documenting
  • benchmarking
  • reserved
  • broadcast

For IPv6:

  • Unspecified
  • Loopback
  • unicast link local
  • unique local
  • documentation
  • IPv4 mapped
  • IPv4-IPv6 translation
  • Discard only
  • IETF Protocol assignments

In the rust-libp2p implementation there was a PR discussing those globally reachable IP address: libp2p/rust-libp2p#3814

Also this list is probably not complete and not formal, it's also a small nitpick.

@sukunrt
Copy link
Member Author

sukunrt commented Nov 8, 2023

@umgefahren it'd be better if you can add the comments as a review, commenting on the specific section.

In the spec it says that all private IP address should be excluded, but it also says it's just for checking reachability on the public internet. That said, we should exclude:

It means non public. Happy to change the wording to non public.

Another quick question, I probably missed something: When the server successfully dials the client and provides the nonce. The client closes the stream either way. How does the server know if the provided nonce was correct?

A correct server will always provide the correct nonce, no? This issue should be easy enough to debug for implementors without signalling from the client. Is there any benefit to the server knowing that it provided an incorrect nonce?

@umgefahren
Copy link

@umgefahren it'd be better if you can add the comments as a review, commenting on the specific section.

I will do that the next time. I'm sorry.

In the spec it says that all private IP address should be excluded, but it also says it's just for checking reachability on the public internet. That said, we should exclude:

It means non public. Happy to change the wording to non public.

Thanks for clarification.

Another quick question, I probably missed something: When the server successfully dials the client and provides the nonce. The client closes the stream either way. How does the server know if the provided nonce was correct?

A correct server will always provide the correct nonce, no? This issue should be easy enough to debug for implementors without signalling from the client. Is there any benefit to the server knowing that it provided an incorrect nonce?

I think there is a benefit. There is a pathological example where the network configuration or a NAT forward traffic to the wrong libp2p node. Not the one that requested the dial back, but a different one. In that case the server would report reachability on that address, but it's actually not reaching the peer in question. I'm not an expert enough here to think of any case where that might occur apart from a bad config or a malicious actor.

@sukunrt
Copy link
Member Author

sukunrt commented Nov 8, 2023

In this case, the client sees that the server is reporting OK-Reachable but it has not received the nonce, so it should reject the response.

@thomaseizinger
Copy link
Contributor

Another quick question, I probably missed something: When the server successfully dials the client and provides the nonce. The client closes the stream either way. How does the server know if the provided nonce was correct?

A correct server will always provide the correct nonce, no? This issue should be easy enough to debug for implementors without signalling from the client. Is there any benefit to the server knowing that it provided an incorrect nonce?

I think there is a benefit. There is a pathological example where the network configuration or a NAT forward traffic to the wrong libp2p node.

If we dial the node with /p2p, the connection will never be fully established if we end up at a different node so you can't send the nonce over.

@umgefahren
Copy link

So since that is not possible, the implementation doesn't needs to handle this case, right?

But thanks for the clarification and sorry for the dumb questions.

@thomaseizinger
Copy link
Contributor

So since that is not possible, the implementation doesn't needs to handle this case, right?

Yep I think you are right! We can assume that this will never happen. Feel free to use debug_assert if you want to be sure!

But thanks for the clarification and sorry for the dumb questions.

No worries at all! I think your questions are pretty spot on actually :)


Cli -> Srv: [dial] DialRequest:{nonce: 0xabcd, addrs: (addr1, addr2, addr3)}
Srv -> Cli: [attempt]addr2 DialAttempt:{nonce: 0xabcd}
Srv -> Cli: [dial] DialResponse:{status: OK, dialStatuses:(E_TRANSPORT_NOT_SUPPORTED, OK)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mentions E_TRANSPORT_NOT_SUPPORTED but that is missing from the protobufs?

@umgefahren
Copy link

While doing the rust-libp2p implementation, we discovered a race condition, which we are now circumventing by a 100ms delay. You can read the finally comment by @thomaseizinger here: umgefahren/rust-libp2p#1 (comment)

It happens when the server successfully performs a dial back, thus sends the confirmation of the address back to the client. However the client hasn't progressed enough to be notified of that successful dial back when receiving the confirmation. In that case the client wrongly assumed an address was confirmed where no dial back occurred.

@thomaseizinger
Copy link
Contributor

In that case the client wrongly assumed an address was confirmed where no dial back occurred.

Minor correction here: The behaviour is usually that the client discards the "successful" confirmation because it has not yet processed the dial-back so it thinks the server is sending it a confirmation without having actually done the dial.

I think the correct way to solve this would be to add an ACK message from the client back to the server for the dial-back where the client can say: "Yes I've processed your dial-back". The server can then proceed to respond on the other stream and thus guarantee that we don't have a race condition between the two streams.

@sukunrt
Copy link
Member Author

sukunrt commented Jan 29, 2024

You can read the closing of the stream as the ACK. See: https://github.com/libp2p/go-libp2p/blob/sukun/autonat-v2-2/p2p/protocol/autonatv2/server.go#L251-L257

The spec also dictates closing the stream: https://github.com/libp2p/specs/blame/autonat-v2/autonat/autonat-v2.md#L87

Do you think an explicit ACK is better?

@thomaseizinger
Copy link
Contributor

You can read the closing of the stream as the ACK. See: libp2p/go-libp2p@sukun/autonat-v2-2/p2p/protocol/autonatv2/server.go#L251-L257

The spec also dictates closing the stream: autonat-v2/autonat/autonat-v2.md#L87 (blame)

Do you think an explicit ACK is better?

Yeah I think so. I associate closing a stream with "I have no more data to write". The client never writes data so why wouldn't it immediately close the stream? Also, reading a stream and waiting for that to fail because it has been closed it also somewhat odd 🤷‍♂️

@sukunrt
Copy link
Member Author

sukunrt commented Jan 29, 2024

The client never writes data so why wouldn't it immediately close the stream?

That's a fair point. I'll add an ACK.

@sukunrt
Copy link
Member Author

sukunrt commented Feb 5, 2024

Updated the specs with a DialBackResponse

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thank you!

mergify bot pushed a commit to libp2p/rust-libp2p that referenced this pull request Aug 8, 2024
Closes: #4524

This is the implementation of the evolved AutoNAT protocol, named AutonatV2 as defined in the [spec](https://github.com/libp2p/specs/blob/03718ef0f2dea4a756a85ba716ee33f97e4a6d6c/autonat/autonat-v2.md).
The stabilization PR for the spec can be found under libp2p/specs#538.

The work on the Rust implementation can be found in the PR to my fork: umgefahren#1.

The implementation has been smoke-tested with the Go implementation (PR: libp2p/go-libp2p#2469).

The new protocol addresses shortcomings of the original AutoNAT protocol:

- Since the server now always dials back over a newly allocated port, this made #4568 necessary; the client can be sure of the reachability state for other peers, even if the connection to the server was made through a hole punch.
- The server can now test addresses different from the observed address (i.e., the connection to the server was made through a `p2p-circuit`). To mitigate against DDoS attacks, the client has to send more data to the server than the dial-back costs.

Pull-Request: #5526.
same key repeatedly. The only benefit of going via the server to do this attack
is not spending bandwidth required for a handshake. So the prevention mechanism
only focuses on bandwidth costs. There is a minor benefit of bypassing IP
blocklists, but that's made unattractive by the fact that servers may ask 5x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can simply shrug this off. This is called a reflection attack and has been a huge issue for open DNS resolvers.

Fixing the amplification side does go a long way, but paying a 5x bandwidth cost for a bunch of free IP addresses seems like a pretty reasonable tradeoff from an attacker's standpoint (especially because said attacker isn't paying for the bandwidth, but likely needs to compromise one machine per IP address).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note: home NAT users likely don't need this feature. That is:

  1. They likely only need 1 dialable address.
  2. They likely don't care which one.
  3. Their outbound and inbound IPs are likely identical.

Being willing to dial other addresses does matter for, e.g., AWS and other special settings where there are separate ingress IP addresses. But, in that case, maybe the user should just configure their node correctly rather than relying on AutoNAT? AutoNAT specifically exists to enable home users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It simplifies client implementations as they don't need to worry about IPv4 peer vs IPv6 peer. Though the benefit isn't huge since most IPv4 servers won't have IPv6 connectivity so they any way cannot check the IPv6 address.

Fixing the amplification side does go a long way, but paying a 5x bandwidth cost for a bunch of free IP addresses seems like a pretty reasonable tradeoff from an attacker's standpoint (especially because said attacker isn't paying for the bandwidth, but likely needs to compromise one machine per IP address).

Can you elaborate here? why isn't the attacker paying for the bandwidth.

@p-shahi
Copy link
Member

p-shahi commented Sep 10, 2024

@sukunrt given that AutoNatv2 is merged in two reference implementations libp2p/rust-libp2p#5526 (released in 0.13.0) and libp2p/go-libp2p#2469 (released in 0.36.1)

Are there any outstanding comments that need to be addressed before this pull request can be merged? - if there are any that are non-blocking, can they be addressed in follow up PRs?
Also, the maturity should be either a Recommendation (I believe there is demonstrated interop between Go and Rust impls?)

@umgefahren
Copy link

Also, the maturity should be either a Recommendation (I believe there is demonstrated interop between Go and Rust impls?)

@sukunrt and I did interop testing and successfully verified that they are working together.

@sukunrt
Copy link
Member Author

sukunrt commented Sep 11, 2024

The implementation is not used in go-libp2p yet. We should merge this after we start inferring reachability in go-libp2p.

TimTinkers pushed a commit to unattended-backpack/rust-libp2p that referenced this pull request Sep 14, 2024
Closes: libp2p#4524

This is the implementation of the evolved AutoNAT protocol, named AutonatV2 as defined in the [spec](https://github.com/libp2p/specs/blob/03718ef0f2dea4a756a85ba716ee33f97e4a6d6c/autonat/autonat-v2.md).
The stabilization PR for the spec can be found under libp2p/specs#538.

The work on the Rust implementation can be found in the PR to my fork: umgefahren#1.

The implementation has been smoke-tested with the Go implementation (PR: libp2p/go-libp2p#2469).

The new protocol addresses shortcomings of the original AutoNAT protocol:

- Since the server now always dials back over a newly allocated port, this made libp2p#4568 necessary; the client can be sure of the reachability state for other peers, even if the connection to the server was made through a hole punch.
- The server can now test addresses different from the observed address (i.e., the connection to the server was made through a `p2p-circuit`). To mitigate against DDoS attacks, the client has to send more data to the server than the dial-back costs.

Pull-Request: libp2p#5526.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Proposal: AutoNAT v2 AutoNAT: Network ReachabilityPublic distinguishes between IPv6 and IPv4
9 participants