Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spec on addressing & use of multiaddr #191

Merged
merged 17 commits into from
Jul 24, 2021
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ expected lifecycle and document formatting.
These specs define abstractions and data types that form the "core" of libp2p
and are used throughout the system.

- [Addressing][spec_addressing] - Working with addresses in libp2p.
- [Connections and Upgrading][spec_connections] - Establishing secure,
multiplexed connections between peers, possibly over insecure, single stream transports.
- [Peer Ids and Keys][spec_peerids] - Public key types & encodings, peer id calculation, and
Expand Down Expand Up @@ -112,3 +113,4 @@ you feel an issue isn't the appropriate place for your topic, please join our
[spec_peerids]: ./peer-ids/peer-ids.md
[spec_connections]: ./connections/README.md
[spec_plaintext]: ./plaintext/README.md
[spec_addressing]: ./addressing/README.md
329 changes: 329 additions & 0 deletions addressing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,329 @@
# Addressing in libp2p
> How network addresses are encoded and used in libp2p

| Lifecycle Stage | Maturity | Status | Latest Revision |
|-----------------|---------------|--------|-----------------|
| 1A | Working Draft | Active | r0, 2019-05-27 |
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved


Authors: [@yusefnapora]

Interest Group: TBD

[@yusefnapora]: https://github.com/yusefnapora

marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
See the [lifecycle document][lifecycle-spec] for context about maturity level
and spec status.

[lifecycle-spec]: https://github.com/libp2p/specs/blob/master/00-framework-01-spec-lifecycle.md

## Table of Contents

- [Addressing in libp2p](#addressing-in-libp2p)
- [Table of Contents](#table-of-contents)
- [Overview](#overview)
- [multiaddr in libp2p](#multiaddr-in-libp2p)
- [multiaddr basics](#multiaddr-basics)
- [Composing multiaddrs](#composing-multiaddrs)
- [Encapsulation](#encapsulation)
- [Decapsulation](#decapsulation)
- [The p2p multiaddr](#the-p2p-multiaddr)
- [Historical Note: the `ipfs` multiaddr Protocol](#historical-note-the-ipfs-multiaddr-protocol)
- [Transport multiaddrs](#transport-multiaddrs)
- [IP and Name Resolution](#ip-and-name-resolution)
- [dnsaddr Links](#dnsaddr-links)
- [TCP](#tcp)
- [WebSockets](#websockets)
- [QUIC](#quic)
- [`p2p-circuit` Relay Addresses](#p2p-circuit-relay-addresses)

## Overview

libp2p makes a distinction between a peer's **identity** and its **location**.
A peer's identity is stable, verifiable, and valid for the entire lifetime of
the peer (whatever that may be for a given application). Peer identities are
derived from public keys as described in the [peer id spec][peer-id-spec].

On a particular network, at a specific point in time, a peer may have one or
more locations, which can be represented using addresses. For example, I may be
reachable via the global IPv4 address of 7.7.7.7 on TCP port 1234.

In a system that only supported TCP/IP or UDP over IP, we could easily write our
addresses with the familiar `<ip>:<port>` notation and store them as tuples of
address and port. However, libp2p was designed to be transport agnostic, which
means that we can't assume that we'll even be using an IP-backed network at all.

To support a growing set of transport protocols without special-casing each
addressing scheme, libp2p uses [multiaddr][multiaddr-repo] to encode network
addresses for all supported transport protocols.
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved

This document covers [how multiaddr is used in libp2p](#multiaddr-in-libp2p).
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
For more information on other use cases, or to find links to multiaddr
implementations in various languages, see the [mulitaddr
repository][multiaddr-repo].

## multiaddr in libp2p

multiaddr is used throughout libp2p for encoding network addresses, and
addresses are generally exchanged over the wire as binary-encoded multiaddrs in
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
libp2p's core protocols.

When exchanging addresses, peers send a multiaddr containing both their network
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: on the wire, we usually don't include the peer-id part (it's usually implicit). I'm not sure how to convey this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When sending self addresses over the wire, we omit the identity part, as it's considered superfluous because libp2p connections are authenticated by principle. If the other party needs to relay our addresses to a third party, it should add the identity part to form a fully qualified address.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revisiting this...

  1. "network" address is usually referred to as the "transport" address (not critical, but it could involve relays and multiple network addresses).
  2. We usually send AddrInfos (AddrInfo{ID: <peer-id>, Addrs: <transport-addrs>}).

But this section is really fine as-is, this is just me nit picking.

address and peer id, as described in [the section on the `p2p`
multiaddr](#the-p2p-multiaddr).

### multiaddr basics

A multiaddr generally represents a path through a stack of successively
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
"higher-level" protocols that can be traversed to some destination.

For example, the `/ip4/7.7.7.7/tcp/1234` multiaddr starts with `ip4`, which is
the lowest-level protocol that requires an address. The `tcp` protocol is
encapsulated within `ip4`, so it comes next.
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved

The multiaddr above consists of two components, the `/ip4/7.7.7.7` component,
and the `/tcp/1234` component. It's not possible to split either one further;
`/ip4` alone is an invalid multiaddr, because the `ip4` protocol was defined to
require a 32 bit address. Similarly, `tcp` requires a 16 bit port number.

Although we referred to `/ip4/7.7.7.7` and `/tcp/1234` as "components" of a
larger TCP/IP address, each is actually a valid multiaddr according to the
multiaddr spec. However, not every valid multiaddr describes a complete path
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
through the network. As we've seen, even a simple TCP/IP connection requires
composing two multiaddrs into one. See the section on [composing
multiaddrs](#composing-multiaddrs) for information on how multiaddrs can be
combined, and the [Transport multiaddrs section](#transport-multiaddrs) for the
combinations that describe valid transport addresses.

The [multiaddr protocol table][multiaddr-proto-table] contains all currently
defined protocols and the length of their address components.

Some multiaddr protocols do not require any additional addressing information.
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
For example, WebSockets are by definition encapsulated within TCP/IP, so the
`/ws` multiaddr protocol is encapsulated within a TCP/IP multiaddr:
`/ip4/7.7.7.7/tcp/1234/ws`. This address is composed of three distinct
multiaddrs, `/ip4/7.7.7.7`, `/tcp/1234`, and `/ws`.
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved

### Composing multiaddrs

As shown above, protocol addresses can be composed within a multiaddr in a way
that mirrors the composition of protocols within a networking stack.

The terms generally used to describe composition of multiaddrs are
"encapsulation" and "decapsulation", and they essentially refer to adding and
removing protocol components from a multiaddr, respectively.

#### Encapsulation

A protocol is said to be "encapsulated within" another protocol when data from
an "inner" protocol is wrapped by another "outer" protocol, often by re-framing
the data from the inner protocol into the type of packets, frames or datagrams
used by the outer protocol.

Some examples of protocol encapsulation are HTTP requests encapsulated within
TCP/IP streams, or TCP segments themselves encapsulated within IP datagrams.

The multiaddr format was designed so that addresses encapsulate each other in
the same manner as the protocols that they describe. The result is an address
that begins with the "outermost" layer of the network stack and works
progressively "inward". For example, in the address `/ip4/7.7.7.7/tcp/80/ws`,
the outermost protocol is IPv4, which encapsulates TCP streams, which in turn
encapsulate WebSockets.

All multiaddr implementations provide an `Encapsulate` method, which combines
two multiaddrs into a composite. For example, `/ip4/7.7.7.7` can encapsulate
`/tcp/42` to become `/ip4/7.7.7.7/tcp/42`.
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved

Note that no "sanity checking" is performed when encapsulating multiaddrs, and
it is possible to create valid but useless multiaddrs like `/tcp/42/udp/42`
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
through encapsulation.

#### Decapsulation

Decapsulation takes a composite multiaddr and removes an "inner"
multiaddr from it, returning the result.

For example, if we start with `/ip4/7.7.7.7/tcp/1234/ws` and decapsulate `/ws`,
the result is `/ip4/7.7.7.7/tcp/1234`.

It's important to note that decapsulation returns the original multiaddr up
to the last occurrence of the decapsulated multiaddr. This may remove more
than just the decapsulated component itself if there are more protocols
encapsulated within it. Using our example above, decapsulating either
`/tcp/1234/ws` _or_ `/tcp/1234` from `/ip4/7.7.7.7/tcp/ws` will result in
`/ip4/7.7.7.7`. This is unsurprising if you consider the utility of the
`/ip4/7.7.7.7/ws` address that would result from simply removing the `tcp`
component.

### The p2p multiaddr

libp2p defines the `p2p` multiaddr protocol, whose address component is the
[peer id][peer-id-spec] of a libp2p peer. The text representation of a `p2p`
multiaddr looks like this:

```
/p2p/QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N
```

Where `QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N` is the base58-encoded
multihash of a peer's public key, also known as their peer id.
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved

By itself, a `p2p` address does not give you enough addressing information to
locate a peer on the network; it is not a transport address. However, like the
`ws` protocol for WebSockets, a `p2p` address can be [encapsulated
within](#encapsulation) another multiaddr.

For example, the above `p2p` address can be combined with the transport address
on which the node is listening:

```
/ip4/7.7.7.7/tcp/1234/p2p/QmYyQSo1c1Ym7orWxLYvCrM2EmxFTANf8wXmmE7DWjhx5N
```

This combination of transport address plus `p2p` address is the format in which
peers exchange addresses over the wire in the [identify protocol][identify-spec]
and other core libp2p protocols.

#### Historical Note: the `ipfs` multiaddr Protocol

The `p2p` multiaddr protocol was originally named `ipfs`, and may be printed as
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
`/ipfs/<peer-id>` instead of `/p2p/<peer-id>` depending on the implementation in
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
use. Both names resolve to the same protocol code, and they are equivalent in the
binary form.


## Transport multiaddrs

Because multiaddr is an open and extensible format, it's not possible to
guarantee that any valid multiaddr is semantically meaningful or usable in a
particular network. For example, the `/tcp/42` multiaddr, while valid, is not
useful on its own as a locator.

This section covers the types of multiaddr supported by libp2p transports. It's
possible that this section will go out of date as new transport modules are
developed, at which point pull-requests to update this document will be greatly
appreciated.

### IP and Name Resolution

Most libp2p transports use the IP protocol as a foundational layer, and as a
result, most transport multiaddrs will begin with a component that represents an
IPv4 or IPv6 address.

This may be an actual address, such as `/ip4/7.7.7.7` or
`/ip6/fe80::883:a581:fff1:833`, or it could be something that resolves to an IP
address, like a domain name.

libp2p will attempt to resolve "name-based" addresses into IP addresses. The
current [multiaddr protocol table][multiaddr-proto-table] defines four
resolvable or "name-based" protocols:

| protocol | description |
|-----------|--------------------------------------------------------------------|
| `dns` | Resolves DNS A and AAAA records into both IPv4 and IPv6 addresses. |
| `dns4` | Resolves DNS A records into IPv4 addresses. |
| `dns6` | Resolves DNS AAAA records into IPv6 addresses. |
| `dnsaddr` | Resolves multiaddrs from a special TXT record. |


When the `/dns` protocol is used, the lookup may result in both IPv4 and IPv6
addresses, in which case IPv6 will be preferred. To explicitly resolve to IPv4
or IPv6 addresses, use the `/dns4` or `/dns6` protocols, respectively.

Note that in some restricted environments, such as inside a web browser, libp2p
may not have access to the resolved IP addresses at all, in which case the
runtime will determine what IP version is used.

When a name-based multiaddr encapsulates another multiaddr, only the name-based
component is affected by the lookup process. For example, if `example.com`
resolves to `1.2.3.4`, libp2p will resolve the address
`/dns4/example.com/tcp/42` to `/ip4/1.2.3.4/tcp/42`.

#### dnsaddr Links
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved

A libp2p-specific DNS-backed format, `/dnsaddr` resolves addresses from a `TXT`
record associated with the `_dnsaddr` subdomain of a given domain.

For example, resolving `/dnsaddr/libp2p.io` will perform a `TXT` lookup for
`_dnsaddr.libp2p.io`. If the result contains an entry of the form
`dnsaddr=<multiaddr>`, the embedded multiaddr will be parsed and used.
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved

The `dnsaddr` lookup serves a similar purpose to a standard A-record DNS lookup,
however there are differences that can be important for some use cases. The most
significant is that the `dnsaddr` entry contains a full multiaddr, which may
include a port number or other information that an A-record lacks, and it may
even specify a non-IP transport. Also, there are cases in which the A-record
already serves a useful purpose; using `dnsaddr` allows a second "namespace" for
libp2p registrations.

### TCP

The libp2p TCP transport is supported in all implementations and can be used
wherever TCP/IP sockets are accessible.

Addresses for the TCP transport are of the form `<ip-multiaddr>/tcp/<tcp-port>`,
where `<ip-multiaddr>` is a multiaddr that resolves to an IP address, as
described in the [IP and Name Resolution section](#ip-and-name-resolution).
The `<tcp-port>` argument must be a 16-bit unsigned integer.

### WebSockets

WebSocket connections are encapsulated within TCP/IP sockets, and the WebSocket
multiaddr format mirrors this arrangement.

A libp2p WebSocket multiaddr is of the form `<tcp-multiaddr>/ws`, where
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
`<tcp-multiaddr`> is a valid mulitaddr for the TCP transport, as [described
above](#tcp).

### QUIC

QUIC sessions are encapsulated within UDP datagrams, and the libp2p QUIC
multiaddr format mirrors this arrangement.

A libp2p QUIC multiaddr is of the form `<ip-multiaddr>/udp/<udp-port>/quic`,
where `<ip-multiaddr>` is a multiaddr that resolves to an IP address, as
described in the [IP and Name Resolution section](#ip-and-name-resolution).
The `<udp-port>` argument must be a 16-bit unsigned integer.
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved


### `p2p-circuit` Relay Addresses

The libp2p [circuit relay protocol][relay-spec] allows a libp2p peer to relay
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
traffic between two peers who could otherwise not communicate directly.

Once a relay connection is established, peers can accept incoming connections
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved
through the relay, using a `p2p-circuit` address.

Like the `ws` WebSocket multiaddr protocol the `p2p-circuit` multiaddr does not
carry any additional address information. Instead it is composed with two other
multiaddrs to describe a relay circuit.

A full `p2p-circuit` address that describes a relay circuit is of the form:
`<relay-multiaddr>/p2p-circuit/<destination-multiaddr>`.

`<relay-multiaddr>` is the full address for the peer relaying the traffic (the
"relay node"), including both the transport address and the `p2p` address
containing the relay node's peer id.
marten-seemann marked this conversation as resolved.
Show resolved Hide resolved

The details of the transport connection between the relay node and the
destination peer are usually not relevant to other peers in the network, so
`<destination-multiaddr>` generally only contains the `p2p` address of the
destination peer.

A full example would be:

```
/ip4/127.0.0.1/tcp/5002/p2p/QmdPU7PfRyKehdrP5A3WqmjyD6bhVpU1mLGKppa2FjGDjZ/p2p-circuit/p2p/QmVT6GYwjeeAF5TR485Yc58S3xRF5EFsZ5YAF4VcP3URHt
```

Here, the destination peer has the peer id
`QmVT6GYwjeeAF5TR485Yc58S3xRF5EFsZ5YAF4VcP3URHt` and is reachable through a
relay node with peer id `QmdPU7PfRyKehdrP5A3WqmjyD6bhVpU1mLGKppa2FjGDjZ` running
on TCP port 5002 of the IPv4 loopback interface.


[peer-id-spec]: ../peer-ids/peer-ids.md
[identify-spec]: ../identify/README.md
[multiaddr-repo]: https://github.com/multiformats/multiaddr
[multiaddr-proto-table]: https://github.com/multiformats/multiaddr/blob/master/protocols.csv
[relay-spec]: ../relay/README.md