Skip to content

AirPlay2 Protocol

Stefano Bono edited this page Nov 8, 2021 · 2 revisions

AirPlay is a proprietary protocol stack/suite developed by Apple Inc. that allows wireless streaming between devices of audio, video, device screens, and photos, together with related metadata. Originally implemented only in Apple's software and devices, it was called AirTunes and used for audio only.

-- Wikipedia

Disclaimer

This guide is for educational purposes only.
You don't need to follow this guide to use airplay with compatible devices.
This guide is not meant to incite hacking.

Knowledge info

Protocols used Enc/Dec Algorithm used Audio and Video foundation used
mDNS ed25519 H264
HTTP AES CBC AAC
RTP AES CTR ALAC
RTSP curve25519 PWM
NTP - -

Service Discovery

AirPlay can find devices thanks to mDNS protocol. In a local network, the receiving device advertises two services (AirTunes service and AirPlay service) publishing 'A', 'TXT', 'PTR' and 'SRV' records.
The caller, on the other hand, sends an IP multicast query message to identify the receiver.

AirTunes service is used to exchange informations between devices.
AirPlay service is used to send/receive audio and video streaming.

How to find services

If we try to sniff the local network traffic with WireShark we can find the DNS records published by the receiver.
Filter used on WireShark: ip.src == 192.168.1.197 && mdns.

PTR: retrieve info about available receiver's services

Schermata 2021-03-18 alle 22 31 27

TXT: retrieve info about functionality available on receiver (ex: supported encryption types and other similar metadata)

Schermata 2021-03-18 alle 22 32 21

SRV: retrieve info about services ports

Schermata 2021-03-18 alle 22 32 59

WireShark Capture Analysis

TXT records, contains informations about receiver functionality, here a list of possible values:

TXT Records

Service Key Value Description
AirTunes txtvers 1 TXT record version
AirTunes ch 2 number of audio channels
AirTunes cn 0,1,2,3 audio codecs
AirTunes et 0,3,5 supported encryption types
AirTunes md 0,1,2 supported metadata types
AirTunes pw false speaker require password
AirTunes sr 44100 audio sample rate
AirTunes ss 16 audio sample size
AirTunes da true ????
AirTunes sv false ????
AirTunes ft 0x5A7FFFF7,0x1E,0x4A7FFFF7 available features
AirTunes am AppleTV5,3 device model
AirTunes pk hex string public key
AirTunes sf 0x4 ????
AirTunes tp UDP supported transport (UDP, TCP)
AirTunes vn 65537 ????
AirTunes vs 220.68 receiver version
AirTunes vv 2 ????
AirPlay deviceid 00:00:00:00:00 mac address
AirPlay features 0x5A7FFFF7,0x1E,0x4A7FFFF7 available features
AirPlay flags 20 bit hex number bitfield of status flags
AirPlay model AppleTV5,3 device model
AirPlay pk hex string public key
AirPlay pi aa072a95-0318-4ec3-b042-4992495877d3 PublicCUAirPlayPairingIdentifier
AirPlay srcvers 220.68 receiver version
AirPlay vv 2 ????

Audio codecs

value description
0 PCM
1 Apple Lossless (ALAC)
2 AAC
3 ELD (Enhanced Low Delay)

Encryption Types

value description
0 no encryption
1 RSA (AirPort Express)
3 FairPlay
4 MFiSAP (3rd-party devices)
5 FairPlay SAPv2.5

Metadata Types

value description
0 text
1 artwork
2 progress

Features bit values (source)

bit name description
0 Video video supported
1 Photo photo supported
2 VideoFairPlay video protected with FairPlay DRM
3 VideoVolumeControl volume control supported for videos
4 VideoHTTPLiveStreams http live streaming supported
5 Slideshow slideshow supported
7 Screen mirroring supported
8 ScreenRotate screen rotation supported
9 Audio audio supported
11 AudioRedundant audio packet redundancy supported
12 FPSAPv2pt5_AES_GCM FairPlay secure auth supported
13 PhotoCaching photo preloading supported
14 Authentication4 Authentication type 4. FairPlay authentication
15 MetadataFeature1 bit 1 of MetadataFeatures. Artwork.
16 MetadataFeature2 bit 2 of MetadataFeatures. Progress.
17 MetadataFeature0 bit 0 of MetadataFeatures. Text.
18 AudioFormat1 support for audio format 1
19 AudioFormat2 support for audio format 2. This bit must be set for AirPlay 2 connection to work
20 AudioFormat3 support for audio format 3. This bit must be set for AirPlay 2 connection to work
21 AudioFormat4 support for audio format 4
23 Authentication1 Authentication type 1. RSA Authentication
26 HasUnifiedAdvertiserInfo
27 SupportsLegacyPairing
30 RAOP RAOP is supported on this port. With this bit set your don't need the AirTunes service
32 IsCarPlay / SupportsVolume Don’t read key from pk record it is known
33 SupportsAirPlayVideoPlayQueue
34 SupportsAirPlayFromCloud
38 SupportsCoreUtilsPairingAndEncryption SupportsHKPairingAndAccessControl, SupportsSystemPairing and SupportsTransientPairing implies SupportsCoreUtilsPairingAndEncryption
40 SupportsBufferedAudio Bit needed for device to show as supporting multi-room audio
41 SupportsPTP Bit needed for device to show as supporting multi-room audio
42 SupportsScreenMultiCodec
43 SupportsSystemPairing
46 SupportsHKPairingAndAccessControl
48 SupportsTransientPairing SupportsSystemPairing implies SupportsTransientPairing
50 MetadataFeature4 bit 4 of MetadataFeatures. binary plist.
51 SupportsUnifiedPairSetupAndMFi Authentication type 8. MFi authentication
52 SupportsSetPeersExtendedMessage

Flag bit values (source)

bit name description
0 Problem has been detected Defined in CarPlay section of MFi spec. Not seen set anywhere
1 Device is not configured Defined in CarPlay section of MFi spec. Not seen set anywhere
2 Audio cable is attached Defined in CarPlay section of MFi spec. Seen on AppleTV, Denon AVR, HomePod, Airport Express
3 PINRequired
6 SupportsAirPlayFromCloud
7 PasswordRequired
9 OneTimePairingRequired
10 DeviceWasSetupForHKAccessControl
11 DeviceSupportsRelay Shows in logs as relayable. When set iOS will connect to the device to get currently playing track.
12 SilentPrimary
13 TightSyncIsGroupLeader
14 TightSyncBuddyNotReachable
15 IsAppleMusicSubscriber Shows in logs as music
16 CloudLibraryIsOn Shows in logs as iCML
17 ReceiverSessionIsActive Shows in logs as airplay-receiving. Set when Apple TV is receiving anything via AirPlay.

Services

AirTunes - The Handshake

Filter used on WireShark: (ip.src==CLINET_IP && ip.dst==RECEIVER_IP)
When you find the correct request/response, you can click Analyze->Follow->TCP Stream to see the full request/response on WireShark.

These are the time-ordered requests and responses.
You can also understand this from the CSeq header as it is incremental.

------ REQUEST GET /info ------
  
GET /info RTSP/1.0
X-Apple-ProtocolVersion: 1
Content-Length: 70
Content-Type: application/x-apple-binary-plist
CSeq: 0
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
  
bplist00...Yqualifier..ZtxtAirPlay..................................."

Request is 'x-apple-binary-plist', after decoding:

<plist version="1.0">
<dict>
  <key>qualifier</key>
    <array>
      <string>txtAirPlay</string>
    </array>
  </dict>
</plist>

Useful information:
'x-apple-binary-plist' is a special binary encoded plist (apple's binary property list format).
To understand how to decode 'x-apple-binary-plist' format, read this amazing article by Christos Karaiskos.

------ RESPONSE GET /info ------
  
RTSP/1.0 200 OK
Content-Length: 1689
Content-Type: application/x-apple-binary-plist
Server: AirTunes/220.68
CSeq: 0
  
bplist00.......YaudioType........
.....$&(*.... .
  
...%')+TtypeXdisplaysTuuid_..audioInputFormatsXfeatures[refreshRate.. "..!!._..aa:54:01:af:c3:c1...dUmodel.<VheightZAppleTV2,1]sourceVersion_..keepAliveLowPower.-/123456(9;<.0!!!0.78:!=]widthPhysicalV220.68.......[overscanned[widthPixelsO. .w'...n....R^....R..h?.!....$eT.ZmacAddress...,.....\audioFormatsTname.Rvv.....Z..._..inputLatencyMicros[statusFlagsWAppleTV.. "..!!.Wdefault_.$2e388006-13ba-4041-9a67-25dd4a43d536......._..outputLatencyMicros^audioLatenciesXrotation..\heightPixelsVmaxFPSXdeviceID_..audioOutputFormats_.$e0ff8a27-6738-3d56-8a16-cc53aacee925_..keepAliveSendStatsAsBody^heightPhysical.eUwidthRpiRpk..#..8............R.C...".d...j.N.....g.....W.T...+.
.:...M...............v.i...v... .....a.m.?.P.....................j...........H.@...............>................

Response is 'x-apple-binary-plist', after decoding:

<plist version="1.0">
  <dict>
    <key>sdk</key>
      <string>AirPlay;2.1.1-f.1</string>
      <key>sourceVersion</key>
      <string>377.17.24.6</string>
      <key>statusFlags</key>
      <integer>580</integer>
      <key>pi</key>
      <string>2A:1B:57:36:38:D4</string>
      <key>name</key>
      <string>Samsung 7 Series (43)</string>
      <key>build</key>
      <string>17.24.6</string>
      <key>model</key>
      <string>UNU7400</string>
      <key>txtAirPlay</key>
      <string>BWFjbD0wGmRldmljZWlkPTcwOjJBOkQ1OjI0OkIyOjkzG2ZlYXR1cmVzPTB4N0Y4QUQwLDB4MzhCQ0I0Ngdyc2Y9MHgzCmZ2PXAyMC4wLjELZmxhZ3M9MHgyNDQNbW9kZWw9VU5VNzQwMBRtYW51ZmFjdHVyZXI9U2Ftc3VuZxxzZXJpYWxOdW1iZXI9MEJQWDNTSUs5MDQ5MjBODXByb3RvdmVycz0xLjETc3JjdmVycz0zNzcuMTcuMjQuNhRwaT0yQToxQjo1NzozNjozODpENChwc2k9MDAwMDAwMDAtMDAwMC0wMDAwLTAwMDAtMkExQjU3MzYzOEQ0KGdpZD0wMDAwMDAwMC0wMDAwLTAwMDAtMDAwMC0yQTFCNTczNjM4RDQGZ2NnbD0wQ3BrPWIyYmI2YzAyOGM4MjgxOTczMDU2YzYyYzNmMzk4NmFhODVjNjhhOWJhZjgzYzBiYjViMzA1NzA4NWI2MzdiZjc=</string>
    <key>PTPInfo</key>
    <string>OpenAVNU ArtAndLogic-aPTP-changes Commit: 17f0335 on Sep 22, 2018</string>
    <key>protocolVersion</key>
    <string>1.1</string>
    <key>audioLatencies</key>
    <array>
      <dict>
        <key>inputLatencyMicros</key>
        <integer>0</integer>
        <key>type</key>
        <integer>100</integer>
        <key>outputLatencyMicros</key>
        <integer>0</integer>
      </dict>
      <dict>
        <key>inputLatencyMicros</key>
        <integer>0</integer>
        <key>audioType</key>
        <string>default</string>
        <key>type</key>
        <integer>100</integer>
        <key>outputLatencyMicros</key>
        <integer>0</integer>
      </dict>
      <dict>
        <key>inputLatencyMicros</key>
        <integer>0</integer>
        <key>audioType</key>
        <string>media</string>
        <key>type</key>
        <integer>100</integer>
        <key>outputLatencyMicros</key>
        <integer>0</integer>
      </dict>
      <dict>
        <key>inputLatencyMicros</key>
        <integer>0</integer>
        <key>audioType</key>
        <string>telephony</string>
        <key>type</key>
        <integer>100</integer>
        <key>outputLatencyMicros</key>
        <integer>0</integer>
      </dict>
      <dict>
        <key>inputLatencyMicros</key>
        <integer>0</integer>
        <key>audioType</key>
        <string>speechRecognition</string>
        <key>type</key>
        <integer>100</integer>
        <key>outputLatencyMicros</key>
        <integer>0</integer>
      </dict>
      <dict>
        <key>inputLatencyMicros</key>
        <integer>0</integer>
        <key>audioType</key>
        <string>alerts</string>
        <key>type</key>
        <integer>100</integer>
        <key>outputLatencyMicros</key>
        <integer>0</integer>
      </dict>
    </array>
    <key>pk</key>
    <data>
      sHcn1vbNbgi1jt5SXsPN6qJSrZ9oP+shLviiBSRlVOc=
    </data>
    <key>features</key>
    <integer>255521305393072848</integer>
    <key>displays</key>
    <array>
      <dict>
        <key>height</key>
        <integer>1080</integer>
        <key>width</key>
        <integer>1920</integer>
        <key>rotation</key>
        <false/>
        <key>widthPhysical</key>
        <false/>
        <key>heightPhysical</key>
        <false/>
        <key>widthPixels</key>
        <integer>1920</integer>
        <key>heightPixels</key>
        <integer>1080</integer>
        <key>refreshRate</key>
        <integer>60</integer>
        <key>features</key>
        <integer>14</integer>
        <key>maxFPS</key>
        <integer>30</integer>
        <key>overscanned</key>
        <false/>
        <key>uuid</key>
        <string>e0ff8a27-6738-3d56-8a16-cc53aacee925</string>
      </dict>
    </array>
  </dict>
</plist>

WARN: This response was sniffed by WireShark from my Smart TV (Samsung 7 Series 43) and not from an AppleTV.

------ REQUEST POST /pair-setup ------
  
POST /pair-setup RTSP/1.0
Content-Length: 32
Content-Type: application/octet-stream
CSeq: 1
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
  
...............d.......?..1...Rt
  
------ RESPONSE POST /pair-setup ------
  
RTSP/1.0 200 OK
Content-Type: application/octet-stream
Content-Length: 32
Server: AirTunes/220.68
CSeq: 1
  
....M.r..Ek!S.......b...r...s.P3

Client (iOS Device) send this request to ask for our Ed25519 public key.
It send a body of 32 bytes and we must return 32 bytes.
You can ignore the request body and return the key.

Before AppleTV returns 32 bytes to the client, call 'FdkDecodeAudioFun8(rawData, 32, jg, out_size, 1, sessionId);' function where:

  • rawData: request body
  • 32: body length
  • jg: ???
  • out_size: response length
  • sessionId: id used to know current context (AppleTV supporting up to 16 sessions)

Someone has found that method on 'libhpplayaudio.so' library.
After some analysis and a lot of assembly code they came up with this diagram:

2ab98ede70f02263e0ec17115ae7486b

-- hkeyxif

------ REQUEST POST /pair-verify [CSeq: 2] ------
  
POST /pair-verify RTSP/1.0
X-Apple-PD: 1
X-Apple-AbsoluteTime: 566789538
Content-Length: 68
Content-Type: application/octet-stream
CSeq: 2
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
   
.....L?..Fl/...j.Z3...d.....J..s.i37...............a.......|..0...Rt
  
------ RESPONSE POST /pair-verify [CSeq: 2] ------
  
RTSP/1.0 200 OK
Content-Type: application/octet-stream
Content-Length: 96
Server: AirTunes/220.68
CSeq: 2
  
..a..?..abme......3|
|k............r...s2d8..J
..l.dd..a.....?....F..(..+ ..7f.~.x~..|.........
  
------ REQUEST POST /pair-verify [CSeq: 3] ------
  
POST /pair-verify RTSP/1.0
X-Apple-PD: 1
X-Apple-AbsoluteTime: 566789538
Content-Length: 68
Content-Type: application/octet-stream
CSeq: 3
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
  
.............|.<....-..s.w...w....r...K.Lp...}.L
..Q....r_o...T.k2."
  
------ RESPONSE POST /pair-verify [CSeq: 3] ------
  
RTSP/1.0 200 OK
Content-Type: application/octet-stream
Content-Length: 0
Server: AirTunes/220.68
CSeq: 3

Client (iOS Device) send 68 bytes request.
First 4 bytes 01 00 00 00 -> use 01 as flag to check type of verify.

If flag is 01 the remaining bytes are divided as follows [CSeq: 3]:

  • 32 bytes ecdh_their
  • 32 bytes ed_their

Here we must create a ecdh_shared (ed_our + ecdh_theirs) used to initialize the AES CTR 128 chiper.
This ecdh_shared will also be used in the next request [CSeq: 3], to verify the client's signature.

The return packet is 96 bytes.

  • First 32 bytes is our ecdh_ours (generated w/ curve25519)
  • Second 64 bytes is the Ed25519 signature of (ecdh_ours + ecdh_theirs) encrypted with AES CTR 128 encryption.

If flag is 00 the remaining bytes are divided as follows [CSeq: 3]:

  • 64 bytes signature

Here we need to check the signature sent by the client to make sure everything went well.
We must initialize the AES CTR 128 chiper with ecdh_shared key and Verify the signature with Ed25519 algorithm.

Before AppleTV returns 96 bytes to the client, call 'FdkDecodeAudioFun9(rawData, 68, jg, out_size, 1, sessionId);' function where:

  • rawData: request body
  • 68: body length
  • jg: ???
  • out_size: response length
  • sessionId: id used to know current context (AppleTV supporting up to 16 sessions)

After some analysis and a lot of assembly code they came up with this diagram:
For more specific details see the source article.

7279fd9af02fb84dcded6b688af21b15

-- hkeyxif

------ REQUEST POST /fp-setup [CSeq: 4] ------
  
POST /fp-setup RTSP/1.0
X-Apple-ET: 32
Content-Length: 16
Content-Type: application/octet-stream
CSeq: 4
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
  
FPLY...............
      
------ RESPONSE POST /fp-setup [CSeq: 4] ------
      
RTSP/1.0 200 OK
Content-Length: 142
Server: AirTunes/220.68
Content-Type: application/octet-stream
      
FPLY..............D.....K.L/...........a....?....vd.J...Z....g....q...f....h..A>
SK.[.r..t..E.......O.uY............U.B.....V.@...=.u....
  
------ REQUEST POST /fp-setup [CSeq: 5] ------
      
POST /fp-setup RTSP/1.0
X-Apple-ET: 32
Content-Length: 164
Content-Type: application/octet-stream
CSeq: 5
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
  
FPLY..................as90d..K./......A.vv.CC.??....^l8asd([~/. ....Z......O.up.....6q>2....L...+.?.??....^l8..E.................L#....
  
------ RESPONSE POST /fp-setup [CSeq: 5] ------
      
RTSP/1.0 200 OK
Content-Length: 32
Server: AirTunes/220.68
Content-Type: application/octet-stream
      
FPLY..........A.l........B.....

[CSeq: 4] Client (iOS Device) send 16 bytes request.
The 5th byte must be 0x03.
The 15th byte is used to understand which 'mode' to use.

Based on the fifteenth byte, the answer will be:

byte return value
0x00 0x46,0x50,0x4c,0x59,0x03,0x01,0x02,0x00,0x00,0x00,0x00,0x82,0x02,0x00,0x0f,0x9f,0x3f,0x9e,0x0a,0x25,0x21,0xdb,0xdf,0x31,0x2a,0xb2,0xbf,0xb2,0x9e,0x8d,0x23,0x2b,0x63,0x76,0xa8,0xc8,0x18,0x70,0x1d,0x22,0xae,0x93,0xd8,0x27,0x37,0xfe,0xaf,0x9d,0xb4,0xfd,0xf4,0x1c,0x2d,0xba,0x9d,0x1f,0x49,0xca,0xaa,0xbf,0x65,0x91,0xac,0x1f,0x7b,0xc6,0xf7,0xe0,0x66,0x3d,0x21,0xaf,0xe0,0x15,0x65,0x95,0x3e,0xab,0x81,0xf4,0x18,0xce,0xed,0x09,0x5a,0xdb,0x7c,0x3d,0x0e,0x25,0x49,0x09,0xa7,0x98,0x31,0xd4,0x9c,0x39,0x82,0x97,0x34,0x34,0xfa,0xcb,0x42,0xc6,0x3a,0x1c,0xd9,0x11,0xa6,0xfe,0x94,0x1a,0x8a,0x6d,0x4a,0x74,0x3b,0x46,0xc3,0xa7,0x64,0x9e,0x44,0xc7,0x89,0x55,0xe4,0x9d,0x81,0x55,0x00,0x95,0x49,0xc4,0xe2,0xf7,0xa3,0xf6,0xd5,0xba
0x01 0x46,0x50,0x4c,0x59,0x03,0x01,0x02,0x00,0x00,0x00,0x00,0x82,0x02,0x01,0xcf,0x32,0xa2,0x57,0x14,0xb2,0x52,0x4f,0x8a,0xa0,0xad,0x7a,0xf1,0x64,0xe3,0x7b,0xcf,0x44,0x24,0xe2,0x00,0x04,0x7e,0xfc,0x0a,0xd6,0x7a,0xfc,0xd9,0x5d,0xed,0x1c,0x27,0x30,0xbb,0x59,0x1b,0x96,0x2e,0xd6,0x3a,0x9c,0x4d,0xed,0x88,0xba,0x8f,0xc7,0x8d,0xe6,0x4d,0x91,0xcc,0xfd,0x5c,0x7b,0x56,0xda,0x88,0xe3,0x1f,0x5c,0xce,0xaf,0xc7,0x43,0x19,0x95,0xa0,0x16,0x65,0xa5,0x4e,0x19,0x39,0xd2,0x5b,0x94,0xdb,0x64,0xb9,0xe4,0x5d,0x8d,0x06,0x3e,0x1e,0x6a,0xf0,0x7e,0x96,0x56,0x16,0x2b,0x0e,0xfa,0x40,0x42,0x75,0xea,0x5a,0x44,0xd9,0x59,0x1c,0x72,0x56,0xb9,0xfb,0xe6,0x51,0x38,0x98,0xb8,0x02,0x27,0x72,0x19,0x88,0x57,0x16,0x50,0x94,0x2a,0xd9,0x46,0x68,0x8a
0x02 0x46,0x50,0x4c,0x59,0x03,0x01,0x02,0x00,0x00,0x00,0x00,0x82,0x02,0x02,0xc1,0x69,0xa3,0x52,0xee,0xed,0x35,0xb1,0x8c,0xdd,0x9c,0x58,0xd6,0x4f,0x16,0xc1,0x51,0x9a,0x89,0xeb,0x53,0x17,0xbd,0x0d,0x43,0x36,0xcd,0x68,0xf6,0x38,0xff,0x9d,0x01,0x6a,0x5b,0x52,0xb7,0xfa,0x92,0x16,0xb2,0xb6,0x54,0x82,0xc7,0x84,0x44,0x11,0x81,0x21,0xa2,0xc7,0xfe,0xd8,0x3d,0xb7,0x11,0x9e,0x91,0x82,0xaa,0xd7,0xd1,0x8c,0x70,0x63,0xe2,0xa4,0x57,0x55,0x59,0x10,0xaf,0x9e,0x0e,0xfc,0x76,0x34,0x7d,0x16,0x40,0x43,0x80,0x7f,0x58,0x1e,0xe4,0xfb,0xe4,0x2c,0xa9,0xde,0xdc,0x1b,0x5e,0xb2,0xa3,0xaa,0x3d,0x2e,0xcd,0x59,0xe7,0xee,0xe7,0x0b,0x36,0x29,0xf2,0x2a,0xfd,0x16,0x1d,0x87,0x73,0x53,0xdd,0xb9,0x9a,0xdc,0x8e,0x07,0x00,0x6e,0x56,0xf8,0x50,0xce
0x03 0x46,0x50,0x4c,0x59,0x03,0x01,0x02,0x00,0x00,0x00,0x00,0x82,0x02,0x03,0x90,0x01,0xe1,0x72,0x7e,0x0f,0x57,0xf9,0xf5,0x88,0x0d,0xb1,0x04,0xa6,0x25,0x7a,0x23,0xf5,0xcf,0xff,0x1a,0xbb,0xe1,0xe9,0x30,0x45,0x25,0x1a,0xfb,0x97,0xeb,0x9f,0xc0,0x01,0x1e,0xbe,0x0f,0x3a,0x81,0xdf,0x5b,0x69,0x1d,0x76,0xac,0xb2,0xf7,0xa5,0xc7,0x08,0xe3,0xd3,0x28,0xf5,0x6b,0xb3,0x9d,0xbd,0xe5,0xf2,0x9c,0x8a,0x17,0xf4,0x81,0x48,0x7e,0x3a,0xe8,0x63,0xc6,0x78,0x32,0x54,0x22,0xe6,0xf7,0x8e,0x16,0x6d,0x18,0xaa,0x7f,0xd6,0x36,0x25,0x8b,0xce,0x28,0x72,0x6f,0x66,0x1f,0x73,0x88,0x93,0xce,0x44,0x31,0x1e,0x4b,0xe6,0xc0,0x53,0x51,0x93,0xe5,0xef,0x72,0xe8,0x68,0x62,0x33,0x72,0x9c,0x22,0x7d,0x82,0x0c,0x99,0x94,0x45,0xd8,0x92,0x46,0xc8,0xc3,0x59

[CSeq: 5] Client (iOS Device) send 164 bytes request.
The 5th byte must be 0x03.
You must save the 164 bytes because this is the KeyMessage.
In the next step I will explain when and how to use this KeyMessage.

You must return 32 bytes to the Client.
First 12 bytes are fairplay header (0x46, 0x50, 0x4c, 0x59, 0x03, 0x01, 0x04, 0x00, 0x00, 0x00, 0x00, 0x14).
The remaining bytes are the last 20 bytes of the request.

AirTunes - Mirroring Data Setup

After the handshake protocol, there will be two SETUP requests used to initialize screen Mirroring.

------ REQUEST SETUP rtsp:// [CSeq: 6] ------
  
SETUP rtsp://192.168.1.24/2893748923472384328 RTSP/1.0
Content-Length: 535
Content-Type: application/x-apple-binary-plist
CSeq: 6
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
  
bplist00........... 
..
.................RetSeiv^timingProtocol[sessionUUIDVosName^osBuildVersion]sourceVersionZtimingPort_..isScreenMirroringSessionYosVersionTekeyXdeviceIDUmodelTnameZmacAddress. O....mD..9o.YRR.0./SNTP_.$43C10532-7CBC-419E-9BB3-528F7D6F9AE0YiPhone OSV16A404W371.4.7..    V12.0.1O.HFPLY.......<.....nT=......9..X......w.Jw9.t.v..iK.c....Tj.u..G..KL.....X_..DC:A3:F1:B2:A6:DAYiPhone9,1jT..2v.. .i.P.h.o.n.e_..DC:0C:5C:B7:D6:D8...).,.0.?.K.R.a.o.z.......................
....... .'.r......................................
  
------ RESPONSE SETUP rtsp:// [CSeq: 6] ------
  
RTSP/1.0 200 OK
Content-Length: 0
Server: AirTunes/220.68
CSeq: 6
  
------ REQUEST SETUP rtsp:// [CSeq: 10] ------
  
SETUP rtsp://192.168.1.24/2893748923472384328 RTSP/1.0
Content-Length: 188
Content-Type: application/x-apple-binary-plist
CSeq: 10
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
  
bplist00...Wstreams.........Ttype]timestampInfo_..streamConnectionID.n. .....
.TnameUSubSu.
  
UBePxT.
.UAfPxT.
.UBefEn.
.UEmEnc.D...6QD......!/DFLOTZ]cfloux~................................
  
------ RESPONSE SETUP rtsp:// [CSeq: 10] ------
  
RTSP/1.0 200 OK
Content-Length: 120
Content-Type: application/x-apple-binary-plist
Server: AirTunes/220.68
  
bplist00..l.n.....YeventPort...ZtimingPortWstreamsXdataPort....cTtype...
. .E*;
2.@....=...............................L

The first request is 'x-apple-binary-plist' data, after decoding:

// REQUEST [CSeq: 6] DECODED
<plist version="1.0">
  <dict>
    <key>et</key>
    <integer>32</integer>
    <key>eiv</key>
    <data>
      Ct34RID9D/KJALLCJzDWi==
    </data>
    <key>timingProtocol</key>
    <string>NTP</string>
    <key>sessionUUID</key>
    <string>532E49A1-E89A-75D3-A355-426614181992</string>
    <key>osName</key>
    <string>iPhone OS</string>
    <key>osBuildVersion</key>
    <string>18A8395</string>
    <key>sourceVersion</key>
    <string>420.4.7</string>
    <key>timingPort</key>
    <integer>60373</integer>
    <key>isScreenMirroringSession</key>
    <true/>
    <key>osVersion</key>
    <string>14.4.2</string>
    <key>ekey</key>
    <data>
      RlCFAVGFAQHFAAA7ASDDAALLvUD1C2vqRjLK9wtJY6v9AS9d5dLLzn2JSJ2ysNpS4VasdkFHlOkAusDqUeXzEoAiDLdF/5Y
    </data>
    <key>deviceID</key>
    <string>2A:1B:57:36:38:D4</string>
    <key>model</key>
    <string>iPhone13,3</string>
    <key>name</key>
      <string>SteeBono</string>
    <key>macAddress</key>
    <string>DC:2A:4C:A7:B2:E4</string>
  </dict>
</plist>

As you can see the first request give us useful informations like:

  • ekey: AES Key (we have to save this data because we will need it later)
  • eiv: AES IV (we have to save this data because we will need it later)
  • timingPort: port used for the heartbeat (you can change in the response)
  • timingProtocol: protocol used to send timing data
  • isScreenMirroringSession: boolean used to indicate the type of streaming (video or audio only)

Response for the first request must be 200 OK with a 'x-apple-binary-plist' body like that:

<plist version="1.0">
<dict>
  <key>eventPort</key>
  <integer>52244</integer>
  <key>timingPort</key>
  <integer>7011</integer>
</plist>

Here you can return the same port used by AirTunes and manage timing and event requests directly from the AirTunes service.

Receiver response must contains the following:

  • eventPort: port used from the client to send events to the receiver

Receiver response can contains the following:

  • timingPort: port used from the client to send heartbeat to the receiver (only if you want change the port sent from client)

The second request is 'x-apple-binary-plist' data, after decoding:

<plist version="1.0">
<dict>
  <key>streams</key>
    <array>
      <dict>
        <key>type</key>
        <integer>110</integer>
        <key>timestampInfo</key>
        <array>
          <dict>
            <key>name</key>
            <string>SubSu</string>
          </dict>
          <dict>
            <key>name</key>
            <string>BePxT</string>
          </dict>
          <dict>
            <key>name</key>
            <string>AfPxT</string>
          </dict>
          <dict>
            <key>name</key>
            <string>BefEn</string>
          </dict>
          <dict>
            <key>name</key>
            <string>EmEnc</string>
          </dict>
        </array>
        <key>streamConnectionID</key>
        <integer>298347298472738472</integer>
      </dict>
    </array>
  </dict>
</plist>

As you can see the second request give us two useful infos:

  • type: type of streaming
    • 96: Real time audio
    • 103: Buffered audio
    • 110: Screen Mirroring
    • 120: Playback
    • 130: Remote control
  • streamConnectionID: id of current connection (we have to save this data because we will need it later)

Here you must initialize Mirroring Service on 7020 port to handle H264 data.

Response for the second request must be 200 OK with a 'x-apple-binary-plist' body like that:

<plist version="1.0">
  <dict>
    <key>streams</key>
    <array>
      <dict>
        <key>dataPort</key>
        <integer>7020</integer>
        <key>type</key>
        <integer>110</integer>
      </dict>
    </array>
  </dict>
</plist>

Receiver response must contains the following:

  • type: type of streaming
    • 96: Real time audio
    • 103: Buffered audio
    • 110: Screen Mirroring
    • 120: Playback
    • 130: Remote control
  • dataPort: port used from the client to send video streaming data to the receiver

AirTunes - Audio Data Setup

After the handshake protocol (if you streaming only audio) or after mirroring data SETUP, there will be another SETUP request used to initialize Audio streaming.

------ REQUEST SETUP rtsp:// [CSeq: 17] ------

SETUP rtsp://192.168.1.24/2893748923472384328 RTSP/1.0
Content-Length: 199
Content-Type: application/x-apple-binary-plist
CSeq: 17
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3

bplist00...Wstreams........ 
..
.
......ZlatencyMax^redundantAudioZlatencyMinRctSspf[controlPort[usingScreen[audioFormatTtype.............  ......`....(3BMPT`lx}.......................................

------ RESPONSE SETUP rtsp:// [CSeq: 17] ------

RTSP/1.0 200 OK
Content-Length: 118
Content-Type: application/x-apple-binary-plist
Server: AirTunes/220.68

bplist00..D........ ..
..ZtimingPortWstreams..hXdataPort.`Ttype[controlPort.$.
/.?,:8................................K

The request is 'x-apple-binary-plist' data, after decoding:

<plist version="1.0">
  <dict>
    <key>streams</key>
    <array>
      <dict>
        <key>latencyMax</key>
        <integer>3750</integer>
        <key>redundantAudio</key>
        <integer>2</integer>
        <key>latencyMin</key>
        <integer>3750</integer>
        <key>ct</key>
        <integer>8</integer>
        <key>spf</key>
        <integer>480</integer>
        <key>controlPort</key>
        <integer>63658</integer>
        <key>usingScreen</key>
        <true/>
        <key>audioFormat</key>
        <integer>16777216</integer>
        <key>type</key>
        <integer>96</integer>
      </dict>
    </array>
  </dict>
</plist>

As you can see this request give us useful informations like:

  • latencyMax: the audio latency
  • redundantAudio: redundancy when transmitting audio frames across a lossy network transport
  • latencyMin: the audio latency
  • ct: compression type
  • spf: frames per packet
  • controlPort: port used to request resend lost packet
  • usingScreen: boolean used to indicate the type of streaming (video + audio or audio only)
  • audioFormat: the audio format
    • 0x0: PCM
    • 0x40000: ALAC (96 AppleLossless, 96 352 0 16 40 10 14 2 255 0 0 44100)
    • 0x400000: AAC (96 mpeg4-generic/44100/2, 96 mode=AAC-main; constantDuration=1024)
    • 0x1000000: AAC_ELD (96 mpeg4-generic/44100/2, 96 mode=AAC-eld; constantDuration=480)
  • type: type of streaming
    • 96: Real time audio
    • 103: Buffered audio
    • 110: Screen Mirroring
    • 120: Playback
    • 130: Remote control

Response for this request must be 200 OK with a 'x-apple-binary-plist' body like that:

<plist version="1.0">
    <dict>
    <key>streams</key>
        <array>
            <dict>
                <key>dataPort</key>
                <integer>34505</integer>
                <key>controlPort</key>
                <integer>40945</integer>
                <key>type</key>
                <integer>96</integer>
            </dict>
        </array>
    </dict>
</plist>

Here you must return the dataPort and controlPort on your server that you wish the client to use for audio, as well as the type 96.
This is similar to the earlier type 110 response, except for the additional control port (Note the data port used for type 96 audio must be different from the one used for type 110 video).

AirPlay - Decrypt AES Key

Before we can decrypt the video stream, we need to decrypt the AES Key received during the first SETUP request.
To decrypt the AES Key, we need the KeyMessage that we have saved after receiving the CSeq 5 request.

I'd like to explain how the AES key is decrypted, but I have no idea how it's done.
Below is a small piece of code that I wrote starting from the original written in C (OmgHax):

OmgHax OmgHax

AirPlay - Screen Mirroring

After the SETUP, the client will start sending the encrypted H264 video stream.

Filter used on WireShark: (ip.src==RECEIVER_IP || ip.src==SENDER_IP) && (ip.dst==RECEIVER_IP || ip.dst==SENDER_IP) && ( udp || (tcp.srcport != SENDER_AIRTUNES_PORT && tcp.dstport != RECEIVER_AIRTUNES_PORT))
In WireShark you can see the parsing by right-clicking -> UDP Package -> Decode As -> NTP.

After decrypting the AES key, we can initialize the AES CTR Decrypter.
To initialize the CTR Decrypter we need to:

  • Perform a combined hash between AES Key and Ecdh Shared (result: eaesHash)
  • Perform a combined hash between a concatenation of "AirPlayStreamKey" + streamConnectionId and first 16 bytes of eaesHash (result: keyHash)
  • Perform a combined hash between a concatenation of "AirPlayStreamIv" + streamConnectionId and first 16 bytes of eaesHash (result: ivHash)
  • Take first 16 bytes of keyHash and ivHash to extract decrypted AES Key and decrypted AES IV

Something like that:
method_hash_ctr_init

The package we will receive from the client will consist of:

  • payloadsize: size of encrypted data
  • payloadtype: type of data (0 - decrypt video data, 1 - process sps/pps)
  • payloadoption: ????
  • pts: data used to instantiate H264 Codec
  • other: ????
  • data: mirroring data (ENCRYPTED, we can DECRYPT with CTR Decrypter initialized in the previous step)

After some analysis and a lot of assembly code they came up with this diagram:
For more specific details see the source article.

Mirroring schema

-- hkeyxif

AirPlay - Audio Streaming

After the SETUP, the client will start sending the encrypted audio stream.

After decrypting the AES key, we can initialize the AES CBC Decrypter.
To initialize the CBC Decrypter we need to:

  • Perform a combined hash between AES Key and Ecdh Shared (result: eaesHash)
  • Initialize AES CBC Decrypter with eaesKey and aesIV received during the first SETUP request

Something like that:
method_hash_cbc_init

The package we will receive from the client will consist of:

  • flag: ????
  • type: type of data (0x56 - decrypt audio data, 0x54 - process RTP headers with NTP time)
  • seq number: sequential number foreach packages
  • timestamp: the timestamp
  • ssrc: ????
  • data: audio data (ENCRYPTED, we can DECRYPT with CBC Decrypter initialized in the previous step)

After some analysis and a lot of assembly code they came up with this diagram:
For more specific details see the source article.

AudioData

-- hkeyxif

Timing Port Data

Timing port is used as an ntp pair.

Receiver -> Client
  
80 d2 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 83 aa 7e 80 00 00 00 f3
Client -> Receiver
  
80 d3 00 07 00 00 00 00 83 aa 7e 80 00 00 00 f3 83 b7 bc e9 3b d6 ea c8 83 b7 bc e9 3b e1 ae 70

Receiver send 32 bytes.
First 24 bytes are fixed, last 8 bytes are the transmission time of the ntp time.

Client send 32 bytes. First 8 bytes are fixed, last 24 bytes are Original Timestamp, Reveice Timestamp and Transmit Timestamp.

Control Port Data

Control port is used to receive infos about RTP and to receive the retransmitted audio data.

If type is 0x56 the package will contain the retransmitted audio package If type is 0x54 the package will contain infos about RTP

Extras

Here you will find other generic requests handled by the AirTunes service.

GET PARAMETER

------ REQUEST GET_PARAMETER rtsp:// [CSeq: x] ------
 
GET_PARAMETER rtsp://192.168.1.24/2893748923472384328 RTSP/1.0
Content-Length: 8
Content-Type: text/parameters
CSeq: x
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
  
volume
  
------ RESPONSE GET_PARAMETER rtsp:// [CSeq: x] ------
  
RTSP/1.0 200 OK
Content-Type: text/parameters
Content-Length: 13
Server: AirTunes/220.68
CSeq: x
  
volume: 0.0

The client makes this call when it wants to know the receiver's volume level.
The body of the request has the content type 'text/parameters'; the parameter here is 'volume'.

Response for that request must be 200 OK with a 'text/parameters' body like that:

volume: 10.0\r\n

Where 'volume' is the parameter and '10.0' is the value.

Pay attention, in this case the 'SET_PARAMETER' request is used to set the volume, but it could be used to change for example the cover of the album being played, the title of a song or other information.

SET PARAMETER

------ REQUEST SET_PARAMETER rtsp:// [CSeq: x] ------
  
SET_PARAMETER rtsp://192.168.1.24/2893748923472384328 RTSP/1.0
Content-Length: 20
Content-Type: text/parameters
CSeq: x
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
  
volume: -12.000000
  
------ RESPONSE SET_PARAMETER rtsp:// [CSeq: x] ------
  
RTSP/1.0 200 OK
Server: AirTunes/220.68
CSeq: x

The client makes this call when it wants to change the receiver's volume level.
The body of the request has the content type 'text/parameters'; the parameter here is 'volume' and the value is '-12.000000'.

Response for that request must be 200 OK without body.

FEEDBACK

------ REQUEST POST /feedback [CSeq: x] ------
  
POST /feedback RTSP/1.0
CSeq: x
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
  
------ RESPONSE POST /feedback [CSeq: x] ------
  
RTSP/1.0 200 OK
Server: AirTunes/220.68
CSeq: x

The client makes this call to ensure the receiver is alive.
The classic 'heartbeat'.

Response for that request must be 200 OK without body.

TEARDOWN

------ REQUEST TEARDOWN rtsp:// [CSeq: x] ------
  
TEARDOWN rtsp://192.168.1.24/2893748923472384328 RTSP/1.0
Content-Length: 69
Content-Type: application/x-apple-binary-plist
CSeq: x
DACP-ID: 6DF49EFF3D005B18
Active-Remote: 2578169230
User-Agent: AirPlay/415.3
  
bplist00...Wstreams.....Ttype.`......................................
  
------ RESPONSE TEARDOWN rtsp:// [CSeq: x] ------
  
RTSP/1.0 200 OK
Connection: close
Server: AirTunes/220.68
CSeq: x

The client makes this call when it wants to stop screen mirroring and audio streaming.

The request is 'x-apple-binary-plist' data, after decoding:

<plist version="1.0">
  <dict>
    <key>streams</key>
    <array>
      <dict>
        <key>type</key>
        <integer>96</integer>
      </dict>
    </array>
  </dict>
</plist>

As you can see, the client sends the type of service to be destroyed.

Type:

  • 96: receiver can destroy audio service
  • 110: receiver can destroy mirroring service

Conclusions

Having this information will make implementing the protocol easier than expected.
It is also possible to create the client following the same logic in reverse.

Thanks to this project, I can say that I have improved my skills.

Thanks for reading,
S.