Skip to content
fduncanh edited this page Aug 30, 2024 · 36 revisions

AirPlay2 "Legacy Pairing" mode: a full account (both client and server)

  • "Legacy Pairing" is activated by bit 27 of The AirPlay "features" code. If this bit is "off", and if the server does not support HomeKit/CoreUtils pairing (features bits 38, 43, 46, 48 are "off"), the "Pairing" part of the client-server initialization is omitted, and (if the server uses FairPlay authentication, features bits 12 and 14 are "on") the client proceeds immediately to "fp-setup" after the initial Get/Info request.

See https://openairplay.github.io/airplay-spec/features.html

"Legacy Pairing" has two parts:

The first part is secure pairing:

  • pair-setup-pin (required if the DNS_SD "TXT Record" of the server's AirPlay and AirTunes service advertisement specify pw=true)

or transient pairing

  • pair-setup (if pw=true is not specified).

Note that corporately-managed AirPlay clients with MDM (Mobile Device Management) may require pair-setup-pin even if the AirPlay server does not specify pw=true.

The second part is

  • pair-verify

which sets up an encrypted AES CTR 128 communication channel each time a pairing session starts.

In the examples below, the client is "User-Agent: AirPlay/745.13.4" and the Server is "Server: AirTunes/220.68".

(While it has nothing to do with pairing, note also that the iPad client here always specifies its DACP-ID, and that it supports an Active-Remote (a remote control), giving two code numbers for access; these numbers change each time a new connection starts.)

Initial contact by the client

When the client discovers the AirPlay service with DNS_SD Service Discovery, it sends an initial GET /info request:

GET /info RTSP/1.0
X-Apple-ProtocolVersion: 1
Content-Length: 70
Content-Type: application/x-apple-binary-plist
CSeq: 0
DACP-ID: 50379D516B134549
Active-Remote: 1681514420
User-Agent: AirPlay/745.13.4

The 70-byte content of the message is a binary "plist" (Apple "Information Property List") with a single entry "qualifier", which is a string "txtAirPlay"

<dict>
        <key>qualifier</key>
        <array>
                <string>txtAirPlay</string>
        </array>
</dict>

The server responds

RTSP/1.0 200 OK 
CSeq: 0 
Server: AirTunes/220.68 
Content-Type: application/x-apple-binary-plist 
Content-Length: 1071 

The content (1071 bytes in the above example) is a plist of various server properties, including

  1. Its txtAirPlay data
  2. Its "features" code (a 64 bit bitstring code)
  3. Its name
  4. Its DeviceID
  5. Its 32-byte public key pk (that was also sent in its DNS_SD service advertisement)

Other information includes what model Apple device it is, the operating system version, the audio format supported, the video capabilities etc. If bit 27 of the features code is set, the client learns that the server "supports Legacy Pairing". (For details of features. see here.) If bit 30 of features is set, RAOP protocol is used instead of "AirTunes", which is assumed here.

Assuming pw=true ("requires password") was advertised by the server, the client sends the pair-pin-start request:

POST /pair-pin-start RTSP/1.0
CSeq: 1
DACP-ID: 50379D516B134549
Active-Remote: 1681514420
User-Agent: AirPlay/745.13.4

The server now displays a 4-digit "pin" code on its display, and sends the reply

RTSP/1.0 200 OK 
CSeq: 1 
Server: AirTunes/220.68 

and also closes the initial connection from the client. This starts the pair-setup-pin process.

  • The 4-digit pin must be manually entered in the "AirPlay Password" box that then opens on the client display.

The "pair-setup-pin" process.

This part of the protocol uses a modified version of the SRP6a (Secure Remote Password) protocol, see SRP Protocol Design. This allows two parties ("User" and "Host") to securely create a "shared secret" called a "Session key" S in the SRP6a description. In the process, the User sends their "Username" and a (weak) cleartext password. In Apple's version, the "Username" is the DeviceID (generally the "true" MAC address of the hardware inside the client device, which is not the "fake" MAC address the client discloses to the network to prevent tracking of the client). The "cleartext password" is the 4-digit pin that the Server displays on its screen, and must be entered in the "password" box shown on the client. While SRP6a is designed to maintain a secure password database, Apple only uses it for an initial "Client-Server" pairing that creates the "shared secret" Session key S, and its (modified) hash K, and it then discards all other SRP data.

SRP6a uses a large Prime N (which has an associated "group" g), and a hash algorithm H(). In "Legacy pairing" Apple uses SRP6a with the large prime N = NG_2048 (256 bytes), with group g = 2, and H() is the the SHA1 hash algorithm with a 20-byte "digest length". In SRP6a, the important form of the shared secret "Session key" is its 20-byte hash K = H(S). Apple modifies standard SRP6a so that K instead is the 40-byte concatenation of two hashes derived from S:

K1 = H( S | {0,0,0,0} )    (20 bytes)
K2 = H( S | {0,0,0,1} )    (20 bytes)
K =  K1 | K2               (40 bytes)

where X|Y means the concatenation of X with Y. (Here S is concatenated with four extra trailing bytes before the 20-byte hashes K1 and K2 are computed.) All calculations are carried out with a SRP library modified for Apple's 40-byte construction of K from S, instead of the SRP6a standard 20-byte calculation of K from S.

Since in Apple's variant, the server is actually telling the client which password to use, the order in which things are done is not quite the same as in standard SRP6a. Both User and Host (i.e., client and server) create 32-byte random "ephemeral secrets" called a and b respectively in the SRP description. From these they will use the large prime to each create public 256-byte "BIGNUM" numbers A and B which they will exchange. In standard SRP, the Host (server) only creates its "SRP keypair" b and B after receiving A from the User (client), but Apple requires the User (client) to receive B from the Host (server) before it sends A, so they must be created (and stored for later use) earlier in the procedure.

Here is the protocol:

Client to Server:

POST /pair-setup-pin RTSP/1.0
Content-Length: 86
Content-Type: application/x-apple-binary-plist
CSeq: 1
DACP-ID: 50379D516B134549
Active-Remote: 1681514420
User-Agent: AirPlay/745.13.4

The content of the message is a "plist" (Apple "Information Property List") containing two strings:

<dict>
        <key>method</key>
        <string>pin</string>
        <key>user</key>
        <string>60:4C:2b:54:D1:73</string>
</dict>

the first string, described as method, is "pin" (three characters) specifying the requested authentication method, and the second string, (17 characters) described as user, is the SRP "Username", here the 6-octet true (immutable) hardware MAC address or "DeviceID" of the client.

On receiving this, the server uses the SRP library to create a "salted verification key" out of "username" (client DeviceID), "password" (the 4-digit pin), and a random "salt" that it generates, using the SRP6 algorithm for this. Apple uses a 16-byte salt s. It then creates its "ephemeral 32 byte SRP secret" (SRP private key) b and uses the large prime to create B (SRP public key, 256 bytes when NG_2048 is the prime). The server's return message to the client is a plist containing the salt s and the server's SRP public key B:

RTSP/1.0 200 OK 
CSeq: 1 
Server: AirTunes/220.68 
Content-Type: application/x-apple-binary-plist 
Content-Length: 342

The content is labeled pk (i.e., B, 256 bytes) and salt (s, 16 bytes). Both are encoded as base64 character strings.

<dict>
        <key>pk</key>
        <data>
        eBpIk7pzrtc6XrnYClwcA4+ZVVpEfEVuQTAxBO4isiLVAtnvr/8AFXAaTXy98xWMJl7I
        qHcI9eQH5VKJXwOD5mz+t6r8uRCcjlLjOQB4S6Uhj6sk4A2JiTWJDYF1H3kuL2lNXTkr
        6tM0r/Fz64DPMk+nf0cD8rVzQ/UkAGUylF/0S2py5EXNfiKOBXqushY1cMXyy7eKVprq
        teKH/b1rJOa0vKbTwpq17bFeGoGByptdlZEoWBjrBJ3by2MN3EjJmpWtQaQazKhhcIN0
        pL32Hi1EMesgBGldgC2n69h2jyhqbjf15jwlDLZy4L/A5jv1Sqo4Be+iPKgbfSk8syb9
        Wg==
        </data>
        <key>salt</key>
        <data>
        WFfo6KUzxbiW2sSkKFpHag==
        </data>
</dict>

The client creates its own 32-byte ephemeral secret a and generates its 256 byte SRP public key A from it. Using a and B it creates a shared secret that the server will also be able to recreate using A and b, once the server has received A from the client. Using the combination of salt s, password (pin), and Username (client DeviceID) plus the server SRP public key B, and its own SRP private key a (used in combination with B to obtain the shared-secret K), the client uses the SRP algorithm to compute a 20 byte proof M1.

X =  H(N)^H(g)       (N = NG_2048, g is its "group" g = 2; ^ means "modular exponentiation")

M1 =   H( X | H(I) | s | PAD(A) | PAD(B) | K )      I = Username, s = salt, K = K1 |  K2

PAD (Y) means the "padded" value of Y, defined in the SRP algorithm. Make sure that your SRP library does NOT replace g by PAD(g) when computing X (apparently some do). Here K is the 40-byte Apple-variant shared-secret hash. The client then sends A, labeled pk, and M1, labeled proof , to the server in a plist:

POST /pair-setup-pin RTSP/1.0
Content-Length: 347
Content-Type: application/x-apple-binary-plist
CSeq: 2
DACP-ID: B241D9C8708F90B4
Active-Remote: 939854658
User-Agent: AirPlay/745.13.4

The content is

<dict>
        <key>pk</key>
        <data>
        kB38RMpBgwXBEGwjVlZlqDIRjeRmsJfcO/De8zaFZPbgk6PxQoV66OdsReIMv8OwOaLy
        mOprsIVVMtYjd1TeJolwcBBYH9dr2+Vo+64FMB6LqRLBV5bf4V1z/D0/1pgJihQCvQqn
        s4jeeN9w2jMYzjPC9jMgfG+RfpUio/fuygeesrHLVyLXFHdc1bgGM1aQ2Bhz/QxPmLkZ
        FKMhac5iCUzLdltZNfu2l2pOrMntQQiOxBNPdKY22eF6Ahaa8z8TkDX//qNol5uYcVsw
        +6JS3IO1VnQx/K5+aCcCq7Ky10APoSRyZZj/GGg2BgoGJiNxg1muM7TZo5j/9F2Gggfa
        5Q==
        </data>
        <key>proof</key>
        <data>
        +s4WYyrYuztJ1giWzGQBDJ9lvI0=
        </data>
</dict>

Since K can either be calculated from a and B, or from A and b, the server can now also calculate K and replicate the calculation to reproduce M1, to verify it gets the same answer, proving to the server that both client and server possess K. If the server's calculation of M1 does not match the one received from the client, an Apple TV 3rd Gen. sends

RTSP/1.0 470  Connection Authorization Required                                                                                                                      
CSeq: 2                                                                                                                                                              Server: AirTunes/220.68   

and terminates the connection (an Apple TV 4K Gen. 1 sends "500 Internal Server Error" as the error code).

If the two calculations of M1 match, the server then computes another 20-byte hashed quantity M2 = H(A,M1,K), and sends that to the client, again labeled as proof in a plist:

RTSP/1.0 200 OK 
CSeq: 2 
Server: AirTunes/220.68 
Content-Type: application/x-apple-binary-plist 
Content-Length: 75 

content is

<dict>
        <key>proof</key>
        <data>
        3tZvdrEmj3gV/KSKsPeYi0HJS8M=
        </data>
</dict>

The client now repeats the calculation of M2 with its K; getting the same answer proves to the client that the server also knows K. (If the calculations of M2 do not match, the client terminates the connection.)

At this point, the SRP6a part of pair-setup-pin is done. No further use of the SPR6a library is made.

The only things that the Server and client need for the next step are their copies of the 40-byte SRP shared secret hash K. All other SRP data can now be discarded.

The 40-byte shared secret hash K is now used by both server and client to create an initial key and iv for AES GCM 128 authenticated encryption.

  • The AES key is the first 16 bytes of a (64 byte) SHA512 digest of the string "Pair-Setup-AES-Key" with K.
  • The AES iv is (initially) the first 16 bytes of the (64 byte) SHA512 digest of the string "Pair Setup-AES-IV" with K.

The strings must be in UTF-8 characters. After building this key and iv, K can be discarded.

In the AES GCM 128 protocol used by Apple, the iv is updated by adding 1 ("iv[15]++;") to the last byte iv[15] of "unsigned char iv[16]" each time a new message is sent between the client and server. (It is not clear what should happen after 255 messages have been exchanged.)

Both client and server must have 32-byte public/private key pairs created with the Ed25519 algorithm (these have no relation to the SRP6 public/private key pairs (a,A) and (b,B) which have already been discarded.) The client and server Ed25519 key pairs appear to be persistent, and should be maintained unchanged. The server has already provided its own Ed25519 public key in the initial plist it sent in response to the client's GET /info request.

The client now updates the initial iv by adding 1 to the last byte (iv[15]) of the AES iv, and uses key and iv to encrypt its 32-byte Ed25519 public key with AES GCM 128. This produces a 32-byte "encrypted public key" epk plus an authTag that can be used for verifying decryption: Apple's protocol uses a 16-byte "authTag" (the maximum allowed size).

The client sends epk and authTag in a plist to the server:

POST /pair-setup-pin RTSP/1.0
Content-Length: 116
Content-Type: application/x-apple-binary-plist
CSeq: 3
DACP-ID: B241D9C8708F90B4
Active-Remote: 939854658
User-Agent: AirPlay/745.13.4

with content

<dict>
        <key>epk</key>
        <data>
        KCL6yz8MzGWJwUbnPYnLp23FLSlKh7XfSopFiYP6fNA=
        </data>
        <key>authTag</key>
        <data>
        vNdynglVcqqGfOeAC1KVpA==
        </data>
</dict>

The server has also built the AES key and initial iv using the SRP shared secret K. When it receives the client epk and authTag, it adds 1 to the last byte of its inital iv, (iv[15]++; in C), and decrypts epk with authentication by authTag, which verifies that the server decrypted epk with the same key and iv used to encrypt it (if the server key and iv were incorrect, it would get a different set of 32 random bytes, and would have no way to detect the error without using the authTag).

If the decryption were invalid, the server would send an error message and disconnect (this should not happen because K has already been verified as known to both parties).

Assuming the decryption is valid, the server now knows the client's Ed25519 public key. It now adds 1 to the last byte of the iv used for this decryption, and uses the AES key and updated iv to encrypt the server's Ed25519 public key to get an epk and authTag which it sends to the client.

RTSP/1.0 200 OK 
CSeq: 3 
Server: AirTunes/220.68 
Content-Type: application/x-apple-binary-plist 
Content-Length: 116

with content

<dict>
        <key>epk</key>
        <data>
        rY4Mv1SiEW+LGBHHJdPQBhQoFuVHXwiNzil1al0gO6I=
        </data>
        <key>authTag</key>
        <data>
        L7uO5nhidxShhy4AIDDowQ==
        </data>
</dict>

The client now adds 1 to the last byte of the iv it used when encrypting its own epk, and uses the key and updated iv to decrypt the server's public key, with authentication. If authentication fails, it should disconnect (true Apple clients do disconnect, when interacting with a non-Apple server that does not correctly update the iv before encrypting its own epk). When all is coded correctly, authentication could only fail if there was a transmission error in sending the messages. It seems that most non-Apple open-source clients do not bother to check decryption of the server epk, since they already know the unencrypted server pk from its response to their initial Get /info request.

When both server and client have decrypted and authenticated the other's epk, the AES key and iv can be discarded, and no more SRP-derived data remains.

After all this both server and client know each other's DeviceID and public key, and know that they have "paired" securely using information (the pin displayed on the server screen, which could only be entered into the client by someone watching the screen) which cannot be faked.

It appears (from code in pair_ap) that in this type (non-HomeKit) of pairing, only the public key (and not the DeviceID) of the pairing partner is saved by Apple devices after pair setup. The data should be saved to permanent storage. Each stored Ed25519 public key is 32 bytes. Pairing of client and server will be persistent provided neither changes its Ed25519 keypair.

The "pair-setup" process ("transient", without pin)

This mode is used by the client if the server's DSN_SD service announcement does not include pw=true. After the server has responded to the "Get /info" request, the client knows the server's deviceID and public key. It sends its 32-byte Ed25519 public key (as a binary payload, not a plist):

POST /pair-setup RTSP/1.0
Content-Length: 32
Content-Type: application/octet-stream
CSeq: 1
DACP-ID: CBF5C44A7802FAED
Active-Remote: 3304431466
User-Agent: AirPlay/745.13.4

The server replies with its own Ed25519 public key, as a binary payload.:

RTSP/1.0 200 OK 
CSeq: 1 
Server: AirTunes/220.68 
Content-Type: application/octet-stream 
Content-Length: 32 

The outcome is same as with the pin pairing, except there is no trust established, and the server does not yet know the client's DeviceID. The server will generally support a finite number of simultaneous client connections. A non-Apple implementation that has been analysed uses 16 as this limit; it appears to assign a "connection context" labeled by pairSessionId to the client, and store the public key of a paired client in that context.

Transient pairing does not seem to survive termination of a connection. Each time the client reconnects, it sends a new no-pin pair-setup request.

Pair verify (needed each time a RTSP session is started)

This happens immediately after the client has successfully paired with the server using pair-setup-pin or pair-setup, or if the client has just sent a GET /info request that showed the server reports a Ed25519 public key that is on the client's list of public keys of servers it has previously securely pin-paired with. (Tests with Apple clients show that it does not matter what DeviceID the server reports, confirming that this is not saved in legacy-pairing pairing records.)

The client first uses the curve-25519 Elliptic Curve Diffie-Hellman (ECDH) algorithm to build a new public/private ECDH keypair (just for this pairing session) and concatenates 4 bytes {1,0,0,0} with its ECDH public key (32 bytes) and its Ed25519 public key (32 bytes) to form a 68 byte sequence {1,0,0,0} | ECDH_PK(client) | Ed25519_PK(client) and sends it to the server as binary data (not a plist):

POST /pair-verify RTSP/1.0
X-Apple-PD: 1
X-Apple-AbsoluteTime: 723748471
Content-Length: 68
Content-Type: application/octet-stream
CSeq: 1
DACP-ID: 946EB6B8C5F155F4
Active-Remote: 2963712443
User-Agent: AirPlay/745.13.4

The server (optionally) checks that the client Ed25519 public key is on its list of Ed25519 public keys of previously securely-paired (with pin) or transiently-paired (without pin) clients, and if it is not found, it sends an error message and disconnects.

If the server decides to allow the client to pair-verify (with either a checked or unchecked Ed25519 public key), it then also generates a new ECDH keypair. With either of the combinations (private_key-ECDH(client) + public_key-ECDH(server)) or ((public_key-ECDH(client) + private_key- ECDH(server)), a shared "ECDH secret" can be derived. Once they have the other party's public ECDH key (PK_ECDH), both the client and the server can build an AES key and iv for AES CTR 128 encryption. Unlike in the previous AES GCM 128 mode encryption, the iv will stay fixed, without updates, as messages are passed.

  • The new AES Key is the first 16 bytes of the 64-byte SHA512 digest of the string "Pair-Verify-AES-Key" with the ECDH shared secret.
  • The new AES iv is the first 16 bytes of the SHA512 digest of "Pair-Verify-AES-IV" with the ECDH shared secret.

The server now builds a "signature". First form a "message" to sign, which is the 64-byte concatenation of the two 32-byte ECDH public keys: PK_ECDH(server)|PK_ECDH(client). (This is a "message" that the client will be able to independently reproduce to check the signature.) Now sign it with the Ed25519 signature algorithm using the server's Ed25519 keypair, which creates a 64-byte signature . The server then uses the AES key and iv to initialize the AES CTR 128 cipher algorithm, and encrypts the signature, to produce a 64-byte encrypted_signature.

It then forms the 96 byte concatenation PK_ECDH(server)|encrypted-signature, and send this to the client (as a binary payload, not a plist):

TSP/1.0 200 OK 
CSeq: 1 
Server: AirTunes/220.68 
Content-Type: application/octet-stream 
Content-Length: 96 

The client now has the server's public ECDH key, and can combine it with its own ECDH private key to build the shared ECDH secret, and then the AES key and iv.

It now uses the key and iv to initialize AES CTR 128, and encrypts encrypted-signature. The symmetric nature of AES CTR means that (with the identical key, iv, and counter value) encrypted-(encrypted-signature) = signature. Since the client can construct the "message" that was signed, and has the public part of the Ed25519 server keypair that was used to produce the signature, it can validate the signature, and know that it was correctly decrypted. This confirms to the client that the AES CTR 128 communication channel is working.

The client then forms the corresponding 64 byte concatenation PK_ECDH(client)|PK_ECDH(server), signs it with its own Ed25519 keypair, and encrypts it with AES CTR 128 to create another 64-byte encrypted-signature. It now adds a 4-byte header to form a 68-byte payload {0,0,0,0} | encrypted-signature, and sends it to the server as a binary payload:

POST /pair-verify RTSP/1.0
X-Apple-PD: 1
X-Apple-AbsoluteTime: 723748472
Content-Length: 68
Content-Type: application/octet-stream
CSeq: 2
DACP-ID: 946EB6B8C5F155F4
Active-Remote: 2963712443
User-Agent: AirPlay/745.13.4

The server now encrypts encrypted_signature to retrieve the unencrypted signature, and uses the client's public Ed25519 key and the "message", PK_ECDH(client)|PK_ECDH(server) , to validate the signature. Assuming it is valid, this proves to the server that a secure communication channel has been opened.

Then pair-verify has been accomplished, and the server sends the empty reply.

RTSP/1.0 200 OK 
CSeq: 2 
Server: AirTunes/220.68 
Content-Type: application/octet-stream 
Content-Length: 0

EDIT 2024-08-30: It now seems that the reply should not be empty, to trigger the start of encryption: see https://github.com/openairplay/ap2-sender/blob/master/pairing.txt (but what should the be binary bit(s) sent be?)

Encrypted client-server communication.

  • All is now in place for a switch to encrypted server-client communication using AES CTR 128, but it is not yet known what the server must do to trigger the client to start using it:perhaps the final Server RTSP response (above)should be sent encrypted?. The "CSeq" numbers in the message headers appear to indicate the AES CTR counter number when encryption is being used.

(from pyatv docs:)

Encryption

After verification has finished, all following messages are encrypted using the derived shared key. Chacha20Poly1305 is used for encryption (just like HAP) with the following attributes:

Salt: empty string
Info: ServerEncrypt-main for decrypting (incoming), ClientEncrypt-main for encrypting (outgoing)

Sequence number (starting from zero) is used as nonce, incremented by one for each sent or received message and encoded as little endian (12 bytes). Individual counters are used for each direction. AAD should be set to the frame header. Do note that encrypting data will add a 16 byte authentication tag at the end, increasing the size by 16 bytes.