While investigating a series of spikes in a certain error code for one of the video streaming iOS and tvOS apps I have been working on for my day job, we were asked by our CDN server team to inject and inspect the information being sent to and from the server during HLS streaming. In order to do this, we needed to add additional HTTP headers to the requests made by AVPlayer and grab the headers from the responses from the server. The idea being that the server will see the extra headers in requests and provide additional diagnostic information as headers in the response to the request.
This proved to be an interesting challenge, as it turn out that Apple does not provide any APIs for doing this natively.
Background
To give some context, here is a brief introduction to HLS (HTTP Live Streaming) and why it is so useful on iOS and tvOS. There are many different types of streaming protocols that are used by apps and websites for delivering video content (i.e. a movie or TV show episode) via HTTP(s). A long time ago, it was common to simply store content in a single file on a server that was then downloaded and played. This was incredibly inefficient, however, as mobile devices and web browsers would typically need to download the entire file before being able to play it. And if a user were on a poor network, they would face abysmal waiting times and usually frequent failures due to network timeouts. Eventually, the idea of media streaming was introduced. This divided the content into smaller downloadable “chunks”, usually between 2-10 seconds long, which the device would then play in sequence seamlessly and without the user noticing. There are also multiple versions of the same chunk at different quality/resolution levels to be used depending on the network quality. HLS is one of the most popular formats and is driven Apple for use on all of it’s operating systems.
Although the implementation between streaming protocols are different, the basic concepts are the same:
– Video content is organized by a manifest (also known as the “master playlist”), which can describe the metadata and url location of “variant playlists”, audio or subtitle language tracks, or the content itself
– A manifest containing multiple variant playlists contains metadata and links to playlists that list the content chunks at various quality/resolution levels.
– Each variant playlist contains the metadata and location of audio/subtitle language tracks and the url location of video chunks for the specific bandwidth that the playlist represents.
– Each chunk is a .ts file based on MPEG2-ts standard and is assembled to fit together perfectly with the other chunks so that they are aligned across all bandwidths and synced with both the audio and CC/subtitle tracks.
– As bandwidth changes during playback, the AVPlayer will select a more appropriate playlist that fits the current bandwidth, and will use the next chunk from that playlist as it buffers, resulting in – ideally – a smooth playback experience where resolution increases and decreases based upon quality of the network.
A master playlist can look something like this:
#EXTM3U
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID=”bipbop_audio”,LANGUAGE=”eng”,NAME=”BipBopAudio 1″,AUTOSELECT=YES,DEFAULT=YES
#EXT-X-MEDIA:TYPE=AUDIO,GROUP-ID=”bipbop_audio”,LANGUAGE=”eng”,NAME=”BipBopAudio 2″,AUTOSELECT=NO,DEFAULT=NO,URI=”alternate_audio_aac_sinewave/prog_index.m3u8”
#EXT-X-MEDIA:TYPE=SUBTITLES,GROUP-ID=”subs”,NAME=”English”,DEFAULT=YES,AUTOSELECT=YES,FORCED=NO,LANGUAGE=”en”, URI=”subtitles/eng/prog_index.m3u8”
#EXT-X-STREAM-INF:BANDWIDTH=1924009,CODECS=”mp4a.40.2, avc1.4d401f”,RESOLUTION=1920×1080,AUDIO=”bipbop_audio”,SUBTITLES=”subs”
gear5/prog_index.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=41457,CODECS=”mp4a.40.2″,AUDIO=”bipbop_audio”,SUBTITLES=”subs”
gear0/prog_index.m3u8
While a variant playlist can look like this:
#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:0
#EXT-X-ALLOW-CACHE:YES
#EXT-X-TARGETDURATION:13
#EXTINF:12.012011,
out000.ts
#EXTINF:9.009011,
out001.ts
#EXTINF:9.009011,
out002.ts
…
#EXT-X-ENDLIST
The Problem
The great thing about AVPlayer on iOS and tvOS is that all the logic above is handled by the operating system for us. All that is required is that the app pass AVPlayer the URL of the master playlist and the AVPlayer will use this to select the appropriate variant playlist and play the associated chunks as it sees fit. It almost all normal scenarios, this is perfectly fine as it takes the responsibility away from the app developer and allows them to focus on the behaviour of the app itself during playback.
But for our diagnostic purposes, this seriously hampered us. After numerous hours of scouring forums, threads, and Stack Overflow, I found no viable solutions. There are currently no APIs available in iOS to get and set this header information for requests made by the AVPlayer. And the official word on the Apple Developer forums was that this simply was not possible.
Of course, for normal requests we can simply use NSMutableURLRequest and NSHTTPURLResponse, but since the AVPlayer does all this for us, we have no way to touch these requests/responses.
There is a way to add additional headers using the undocumented options key, AVURLAssetHTTPHeaderFieldsKey when creating an AVURLAsset object, as shown below:
AVURLAsset *urlAsset = [AVURLAsset URLAssetWithURL:[NSURL URLWithString:customUrl] options:@{@”AVURLAssetHTTPHeaderFieldsKey” : httpHeaders}];
However, It has been strongly warned online that Apple may reject applications that use this key, and I could find no official word on the Apple Developer forums on whether or not this would be accepted. Also, this solution still does not allow us to intercept and retrieve the headers from the response.
We had looked into method swizzling, but DRM (digital rights management) libraries such as InsideSecure will not allow you to swizzle networking methods while trying to play secured, encrypted content since it represents a potential (but glaring) security breach. There is a super-secret way to disable this check when the app is started, but that again means that your video content is rendered less secure.
The Solution
The only valid alternative is to run a reverse proxy server on the device, allowing AVPlayer to pass requests through it, which are then intercepted, modified, sent to their original destination, and then examined when a response is returned. It is a heavy-handed approach, but is the only acceptable way to achieve header augmentation without security breaches or possible rejection by Apple.
The solution that I came up with was built on top of GCDWebServer. Essentially, any web server that gives you access to the requests and responses can be used. I originally tried to use Mongoose as had been done here, as it is a little more light-weight, but encountered a number of problems with the C code and switched over to Objective-C for easier coding on my part.
When a new playback session is started, the client instantiates and starts a local HTTP server on the device, running at http://localhost:8080. Any requests sent to this will allow the proxy to intercept the request, add any additional HTTP headers, then complete the request via the reverse proxy host on port 80, and finally return the response to the original request from the player. Any HTTP headers received in the response are passed along to the client for reporting/diagnostic purposes.
The flow is as follows:
- Client receives request to play content at http://someurl.com/some_manifest.m3u8
- Client starts the local HTTP server at http://localhost:8080 and passes it “someurl.com” as the reverse proxy host name
- Client creates the AVPlayer, passing it http://localhost:8080/some_manifest.m3u8
- AVPlayer tries to start playback by making a network request to http://localhost:8080/some_manifest.m3u8, which is intercepted by the proxy
- Proxy reconstructs what should have been the original request using the reverse proxy host it was passed
- Proxy makes the external request to the server at http://someurl.com/some_manifest.m3u8 with the additional headers
- Proxy extracts the headers from the response and sends them to its listeners
- Proxy returns the data and headers from the request to http://someurl.com/some_manifest.m3u8 as the response to http://localhost:8080/some_manifest.m3u8
- AVPlayer uses this data to start playback
This process is repeated for any variant playlists and chunk URLs that are returned in the manifest. AVPlayer makes calls to http://localhost:8080/chunk_01.ts, and the proxy gets the actual data for it from http://someurl.com/chunk_01.ts, modifying and extracting the headers as it does so.
I’ve created a sample Xcode project that demonstrates the solution in action. You can check it out on GitHub here.
Leave a Reply