Technical: QuickTime
Advanced Search
Apple Developer Connection
Member Login Log In | Not a Member? Contact ADC

QuickTime Generic RTP Payload Format

Dispatch 26

This dispatch is a complete description of the payload format that QuickTime uses to stream media data when a custom payload profile is undefined. The dispatch is presented in the form of an IETF RFC, although it is not, in fact, an IETF document. The format presented here is used by QuickTime 4.

RTP Payload Format for QuickTime Media Streams Abstract

This document specifies the payload format for encapsulating QuickTime media streams in the Realtime Transport Protocol (RTP). This specification is intended for QuickTime media/codec types that are not already handled by other RTP payload specifications. Each QuickTime media track within a movie is sent over a separate RTP session and synchronized using standard RTP techniques. A dynamic payload type should be used. A QuickTime header within the RTP payload is defined to carry the media type and other media specific information. A packetization scheme is defined for the media data. This specification is intended for streaming stored QuickTime movies as well as live QuickTime content.

1. Introduction

This document specifies the payload format for encapsulating QuickTime media streams in the Realtime Transport Protocol (RTP) [1]. RTP is a generic protocol designed to carry realtime media data along with synchronization information over a datagram protocol (mostly UDP over IP). The protocol itself does not address the encapsulation of specific media types, but instead leaves it to various profile specifications. An accompanying RTP profile document [2] contains various payload specifications to carry audio and video over RTP for conferencing applications and specifies the static payload types for various audio/video compression schemes. Other documents specify the encapsulation format used to carry specific compression schemes such as JPEG, MPEG and H.261 [3,4,5].

The QuickTime file format and architecture support an extensible set of media types and compression schemes. Many of these are not covered by the profile specifications available today. Hence, it is desirable to have an RTP encapsulation scheme that will handle all QuickTime media/codec types that are not covered by specific RTP payload types.

This specification proposes a scheme to carry QuickTime media/codec types over RTP. The scheme specified here handles all loss-tolerant media and a few loss-intolerant media such as text. Support for other loss-intolerant media such as MIDI and 3D will be added in future. This specification is intended for streaming stored QuickTime movies as well as live QuickTime content.

2. QuickTime Overview

QuickTime consists of a software architecture for multimedia authoring/playback and a movie file format to store multimedia presentations. These two aspects of QuickTime are independent of each other but are often combined when referring to QuickTime. It is possible to playback/author movies in other file formats such as AVI, AIFF, etc. using QuickTime software. Similarly it is possible to use QuickTime files independent of the software, for example, streaming movies over the Internet. The QuickTime movie file format is specified in [6]. More information on the QuickTime software architecture can be obtained from [7,8,9].

For the purpose of this document we will mostly be concerned with streaming QuickTime content using RTP. "QuickTime content" refers to content as specified in the QuickTime movie file format specification [6]. This does not preclude live QuickTime content. We merely use the file format specification as way to specify the format of the content.

QuickTime movie files contain the media data and synchronization information for the movie. A movie consists of multiple tracks, each of which contains a specific media type such as video, sound, MIDI, text, etc. Not all media types are loss-tolerant The loss tolerant media can be carried over RTP/UDP in classic RTP-style. This will not however work for loss-intolerant data. RTP over TCP or using the Realtime Streaming Protocol (RTSP) [10] are some of the options for loss- intolerant media data. Another option is to achieve semi-reliability through redundant transmission. This specification uses this latter option to handle QuickTime "text" media over RTP.

2.1 QuickTime Timescales

QuickTime has a concept of timescales. A timescale defines the number of units of time that pass in every second of real time. Any time value has to be specified with respect to a timescale. A QuickTime movie has a timescale associated with it. Each of the tracks (medias) have a timescale associated with them. All of these timescales could be different. The RTP timestamp will be based on the timescale of the track associated with the RTP session.

2.2 QuickTime Sample Descriptions

Every QuickTime media type has a sample description format associated with it. The sample description specifies how the sample is interpreted. For example, the video media sample description specifies the compression scheme, quality, bit depth and other such information. The sample description may change during the life of a track.

2.3 QuickTime Track Parameters

Every QuickTime track has a number of parameters associated with it such as height, width, transformation matrix, etc. In many cases, these are as important to the presentation as the sample description.

3. RTP Encapsulation Format

The encapsulation scheme described here requires that each QuickTime media track within a single movie be sent over a separate RTP session and be synchronized using standard RTP techniques.

The QuickTime information is carried as payload data within the RTP protocol. There is a variable length QuickTime header immediately following the RTP header. The media data is packetized and placed in the RTP packet following the QuickTime header.

The RTP packet is formatted as follows:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. .
. RTP Header .
. .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. QuickTime Header... .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. QuickTime Media Data... .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


3.1 RTP Header

The format and general usage of the RTP header fields are described in [1].

The following fields of the RTP header will be used as specified below:

- The payload type should be one of the dynamic payload types, and should be agreed upon through some non-RTP means. If using SDP to negotiate the dyanamic payload type, the dynamic payload name should be x-quicktime or x-qt. E.g. m=video 1234 99 a=rtpmap:99 x-qt

- The RTP timestamp is based on the timescale specified in the QuickTime header. The timestamp encodes the sampling instant of the first media sample contained in the RTP data packet. Multiple samples may be contained in one RTP packet or a single sample may require multiple RTP packets. The packetization rules are specified in a subsequent section. If a media sample occupies more than one packet, the timestamp will be the same on all of those packets. Packets containing different samples must have different timestamps so that samples may be distinguished by the timestamp. The initial value of the timestamp is random (unpredictable) to make known-plaintext attacks on encryption more difficult, see RTP [1].

- The marker bit (M-bit) of the RTP header is set to one in the last packet of a sample and otherwise, must be zero. If one or more samples are fully contained within an RTP packet the M-bit must be set to one. Thus, it is possible to easily detect that a complete sample has been received and can be decoded and presented.

3.2 QuickTime Header

The QuickTime Header is defined as follows:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VER |PCK|S|Q|L| RES |D| QuickTime Payload ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. QuickTime Payload Description ... .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. Sample Specific Information ... .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The fields in the QuickTime Header have the following meanings:

VER: 4 bits
A version field that must be set to zero by transmitters implementing this specification.

PCK: 2 bits
The packetization scheme field is 1, 2 or 3. The value of 0 is reserved. The packetization scheme field is set to the media packetization scheme described in section 3.4. The packetization scheme may be changed from packet to packet without changing the QuickTime Payload ID.

S bit: 1 bit
The S-bit is set to one if this packet contains data from a sync sample, i.e. key sample. Otherwise it is set to zero. When the packet contains more than one sample the S-bit is set to one if any samples are sync samples.

Q bit: 1 bit
The Q-bit is set to one if there is a payload description as part of this QuickTime header. Otherwise it is set to zero.

L bit: 1 bit
The L-bit is set to one if there is packet specific information description as part of this QuickTime header. Otherwise it is set to zero.

RES: 7 bits
Reserved for future use. Transmitter must set these bits to zero. Receivers must ignore these bits.

D bit: 1 bit
The D-bit is set to one if the information associated with the QuickTime payload ID should not be cached. Payload IDs are explained further in a subsequent section.

QuickTime Payload ID: 15 bits
A payload identifier that identifies the format of the QuickTime media data carried in this RTP session. The payload ID associates the QuickTime payload description (that is transmitted periodically) with the QuickTime media data. This identifier is changed every time the payload format changes. Payload IDs are explained further in a subsequent section.

QuickTime Payload Description: variable length
This field is present only if the Q-bit is set to one. It contains the QuickTime payload format description such as media type, timescale, sample size, compression information, etc. The header must be padded to a 32-bit boundary. The format of the QuickTime Payload Description is described in following sections.

Sample Specific Information: variable length
This field is present only if the L-bit is set to one. It information specific to the packet or packets that make up a sample, such as duration and play offset. It must be padded to a 32-bit boundary. The format of the Sample Specific Information is described in following sections.

3.2.1 QuickTime Payload Description

The QuickTime Payload Description is defined as follows:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|K|F|A|Z| RES | QuickTime Payload Desc Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. QuickTime Payload Desc Data ... . 
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The fields in the QuickTime Payload Description have the following meanings:

K bit: 1 bit
The K-bit is set to one if all samples are sync (key) samples for this payload description. Otherwise it is set to zero.

F bit: 1 bit
The F-bit is set to one if the samples are sparse. Otherwise it is set to zero. This is a hint to the receiver that samples might not be sent very often.

A bit: 1 bit
The A-bit is set to one if this packet contains the start of the payload description. Otherwise it is set to zero.

Z bit: 1 bit
The Z-bit is set to one this packet includes the end of the payload description. Otherwise it is set to zero. The combination of A and Z bits allow the QuickTime payload description to be split into multiple packets. The split can be at arbitrary points in the description and does not have to be between a TLV, etc. The entire description must be received for a payload ID before it is considered valid.

RES: 12 bits
Reserved for future use. Transmitter must set these bits to zero. Receivers must ignore these bits.

QuickTime Payload Description Length: 16 bits
Number of bytes in the QuickTime payload description included in this packet (not including padding of 0 to 3 bytes). The QuickTime Media Data starts at the RTP data offset plus the QuickTime fixed header of 4 bytes plus the payload description length (plus padding of 0 to 3 bytes) plus the sample specific information (plus padding of 0 to 3 bytes).

QuickTime Payload Description Data: varies
Number The format of the QuickTime Payload Description Data is described below. If the data can fit into one packet, both the A and Z bits of the payload description are set. If the data is too big to fit into one packet, the data may be split into multiple packets. The packets must be contiguous. The first packet must have the A-bit set, and the last packet must have the Z-bit set. The split can be an at arbitrary point in the data, and does not have to be at a natural boundary (e.g. does not have to be at a TLV boundary). The entire payload description data must be received for a QuickTime payload ID before it is considered valid.

3.2.2 QuickTime Payload Description Data

The QuickTime Payload Description Data is defined as follows:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| QuickTime Media Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Timescale |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. QuickTime TLVs ... .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The fields in the QuickTime Payload Description Data have the following meanings:

QuickTime Media Type: 32 bits
A 4 character media type that identifies the QuickTime media [6], example: 'vide' for video, 'soun' for sound, etc.

Timescale: 32 bits
The number of units of time that pass in 1 second of real time for this QuickTime payload ID. This is the timescale used by the RTP timestamp for thispayload ID.

QuickTime TLVs: variable length
One or more QuickTime parameters that describes this payload. The parameters are expressed as a Type-Length-Value triplet. The TLVs are not padded and can begin at any byte boundary. The format of the TLVs are described in Section 3.3. Any unknown TLV should be skipped.

3.2.2 Sample Specific Information

The sample specific information is defined as follows:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| RES | Sample-Specific Info Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. QuickTime TLVs ... 
.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Reserved: 16 bits
Reserved for future use. Transmitter must set these bits to zero. Receivers must ignore these bits.

Sample Specific Information Length: 16 bits
Number of bytes in the sample specific information (not including padding of 0 to 3 bytes). The QuickTime Media Data starts at the RTP data offset plus the QuickTime fixed header of 4 bytes plus the payload description length (plus padding of 0 to 3 bytes) plus the sample specific information (plus padding of 0 to 3 bytes). If the sample spans more than one packet, the sample specific information needs to be included in at least one of the packets, but does not need to be included in all packets of the sample. If more than one sample is included in a packet, the sample specific information applies to the group of samples in the packet &endash; e.g. a group of audio samples.

QuickTime TLVs: variable length
One or more QuickTime parameters that describes this packet. The parameters are expressed as a Type-Length-Value triplet. The TLVs are not padded and can begin at any byte boundary. Section 3.3. below specifies the format of a QuickTime TLV. Any unknown TLV should be skipped.

3.3 QuickTime TLVs

A QuickTime TLV is formatted as follows:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| QuickTime TLV Length | QuickTime TLV Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. QuickTime TLV Value ... .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The fields in a QuickTime TLV have the following meanings:

QuickTime TLV Length: 16 bits
Number of bytes in the data portion of the QuickTime TLV. This does not include the length and type fields. The next QuickTime TLV starts at the offset of the current TLV plus the current TLV length.

QuickTime TLV Type: 16 bits
A 2 character TLV type that identifies the QuickTime parameter.

QuickTime TLV Value: variable length
The value of the TLV as specified by the type. Values must be sent in network byte-order (i.e. big-endian format).

Note: Some TLVs are mandatory and must be present if the QuickTime Payload Description is being sent. Other TLVs will assume their default values if they are not sent. Any TLV not recognized by a receiver must be ignored and skipped over.

The currently defined TLVs are described below:

Sample Description (mandatory)
Type: 'sd'
Length: variable length
Default: none
Media-specific QuickTime sample description. The format for this TLV for each of the 
currently defined media types can be found in [6] (starting pg. 59).


QuickTime Atom
Type: 'qt'
Length: variable
Default: not applicable
This TLV is used to transparently send a QuickTime Atom as defined in [6] (pg. 3). For 
example, this can be used to send User Data Atoms, Track Reference Atoms, Track Input 
Map Atoms, etc. The QuickTime atoms sent depends on the media type associated with 
the QuickTime payload description.

Track ID
Type: 'ti'
Length: 8
Default: 0
Track ID as defined in [6] (pg. 18).

Layer
Type: 'ly'
Length: 6
Default: 0
Layer as defined in [6] (pg. 18).

Volume
Type: 'vo'
Length: 6
Default: 255
Volume as defined in [6] (pg. 18).

Matrix
Type: 'mx'
Length: 40
Default: identity matrix
Matrix as defined in [6] (pg. 18 and 77).

Translation Matrix
Type: 'tr'
Length: 8
Default: identity matrix
v, h -- two 16-bit signed numbers indicating translation values (in pixels).This TLV is sent 
instead of the Matrix TLV when only translation is required. Note that the order is v, 
then h.

Track Width
Type: 'tw'
Length: 8
Default: 0
Track Width as defined in [6] (pg. 19).

Track Height
Type: 'th'
Length: 8
Default: 0
Track Height as defined in [6] (pg. 19)

Language
Type: 'la'
Length: 6
Default: 0
Language as defined in [6] (pg. 32 and 75).

Rate
Type: 'rt'
Length: 8 (Fixed)
Default: 1.0
Rate of the media.

Graphics Mode
Type: 'gm'
Length: 4 
Default: 0x0040 (copy mode)
The graphics mode of the stream. [must add where these are defined]

Op Color
Type: 'oc'
Length: 12 (RGBColor)
Default: 0x8000 for red, gree, and blue
The op color to be used in conjunction with the graphics mode.

Clip Region
Type: 'cr'
Length: variable (RegionHandle)
Default: no clip region
The clip region to be applied to the visual media.

Duration (sample specific only)
Type: 'du'
Length: 4
Default: unknown duration, or the natural duration of the data
Specifies the play duration of the sample(s). For certain media types, e.g. video, this specifies 
the length of time the sample is to be displayed or rendered. For other media types, e.g. 
midi, this could specify an edit into the sample. See the discussion under Play Offset. 

Play Offset (sample specific only)
Type: 'po'
Length: 4
Default: 0
Specifies the play offset of the sample(s), in the RTP timescale. This, combined with the 
duration, specifies which portion of the data should be played. For example, suppose 
midi is being streamed with a timescale of 1000. If this particular sample has a 
timestamp of 5000 and contains 6 seconds of data, then normally, the midi data in that 
sample will be played from time 5 seconds to time 11 seconds. If this packet contains a 
Duration TLV of 3 seconds, and no Play Offset TLV, then the data is played from time 5 
seconds to time 8 seconds and the last 2 seconds of data in the sample is not played. If 
this packet contains a Duration TLV of 3 seconds and a Play Offset of 2 seconds, then at 
from time 5 seconds the third second of data in the sample will start playing. The Play 
Offset indicates that the first two seconds of data is not played. At time 8 seconds, the 
sample will stop playing and the last one second of data in the sample is never played. 


3.4 Media Data Packetization

The RTP packetization for QuickTime is designed to take into account the needs of a varied set of media types and compression schemes. Hence, 3 different packetization schemes are defined.

The following pieces of information are required at the transmission end to make packetization decisions:

- Maximum QuickTime Media Data size (MQD) that can be accommodated in a single RTP packet.

- Whether all samples for this media type are of constant size? (CQS)

- Whether all samples for this media type are of constant duration? (CQD)

- Sample size of all samples (when they are constant) (CSS).

- Sample size of a specific sample (SS).

Based on the above pieces of information, one of the following packetization schemes is adopted:

Scheme 1 : (CQS=true) AND (CQD=true) AND (CSS <= 0.5*MQD)

Multiple samples are packed into one RTP packet. The RTP header M-bit is set to one on all packets. The QuickTime header PCK field is set to 1.

Scheme 2: ( (CQS=false) OR (CQD=false) ) AND (SS <= 0.5*MQD)

Multiple samples are packed into the QuickTime Media Data portion of an RTP packet. The RTP header M-bit is set to one in this packet. The QuickTime header PCK field is set to 2.

The samples are packed using the format illustrated below:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S| Reserved | Sample Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sample Timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. Sample Data ... .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|S| Reserved | Sample Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sample Timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. Sample Data ... .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. ...... .
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The fields in the QuickTime Media Data have the following meanings:

S bit: 1 bit
The S-bit is set to one if this sample is a sync sample, i.e. key sample. Otherwise it is set to zero

Reserved: 15 bits
Reserved for future use. Transmitter must set these bits to zero. Receivers must ignore these bits.

Sample Length: 16 bits
Number of bytes in the sample data only (not including the s-bit, reserved, length, and timestamp field and padding).

Sample Timestamp: 32 bits
This field contains the relative timestamp for this sample in the timescale associated with this QuickTime payload ID. The timestampe is relative to the timestamp in the RTP header. E.g. A sample that has a relative timestamp of 0 has the same timestamp as the timestamp in the RTP header.

Sample Data: variable length
This field contains the sample data. The data must be padded to a 32-bit boundary.

All receivers are required to handle this scheme. A transmitter may choose to not implement this scheme in which case it will default to scheme 3.

Note: This scheme leads to more efficient packing than scheme 3 for certain media/codec types. However, there is a trade-off between efficiency and losing multiple samples when a packet is lost.

Scheme 3: Cases not covered by schemes 1 and 2

A single sample is placed in one or more RTP packets. The RTP header M-bit is set to one in the last packet and is otherwise set to zero. The QuickTime header PCK field is set to 3. The packetization boundaries may be chosen intelligently to respect the compression/decompression algorithm requirements. However, this is not a requirement. When intelligent boundaries are not chosen, a single packet loss will lead to the entire sample being lost in the case of multi-packet samples.

3.5 Payload Information

3.5.1 Payload ID

The QuickTime payload ID identifies the format of the QuickTime media data carried in an RTP session. It associates the QuickTime payload description (that is transmitted periodically) with the QuickTime media data. This identifier is an arbitrary 15-bit number that is changed every time the payload format changes. When streaming QuickTime movie tracks, the payload format changes usually when the sample description changes during the life of the track.

The following restrictions apply when picking payload IDs,

- The payload ID must be unique among all QuickTime RTP sessions originating from a given source canonical name. This is to ensure efficient mapping of payload IDs to payload descriptions using a single receiver-side table per canonical name.

- A payload ID must not be reused for a different payload description during the lifetime of the session. This allows receivers to cache the payload descriptions for the duration of the session.

An exception to the above restrictions are made when the D-bit is set to 1 in the QuickTime payload description. This indicates that the payload IDs might in fact be reused at some time in the future, and allows live broadcasts of arbitrary changing QuickTime data for an indefinite amount of time. Senders must be careful to reuse the ID only when they are reasonably sure that the receiver has received a different ID since it was last used. When the D-bit is set, receivers must not chache the data associated with a QuickTime payload ID once they receive a packet with a different QuickTime payload ID.

The basic algorithm for senders is:
- If the session is to continue indefinitely and will use indeterminate numbers of QuickTime payload IDs, set the D-bit and cycle through the 32K of QuickTime payload IDs. If senders use the full range of QuickTime payload IDs, then they can be reasonably sure receivers will see the reused payload ID as a new one. (i.e. Any receiver will be very likely to have received a different payload ID since the last time a particular payload ID was used.)
- If the session has known duration, or has known or limited QuickTime payload IDs, then don't set the D-bit.

The basic algorithm for receivers is:
- If the receiver doesn't cache data about payload ID once it gets a different payload ID, then ignore the D-bit.
- If the receiver does cache data about payload ID (A) once it receives a different payload ID (B), then look at the D-bit. If the D-bit is not set, the receiver can cache the data as usual. If the D-bit is set, then the receiver must discard any information about payload ID A. If some time in the future the receiver gets payload ID A again, it cannot use the old information about payload ID A.

3.5.2 Payload Description

The QuickTime payload descriptions are transmitted as part of the QuickTime header. The payload descriptions specify the format of the QuickTime media data. The information for the specific fields in a payload description can be found in [6]. These fields do not include all of the information associated with a QuickTime track. For example, information on transformation matrices, layers, etc. is not included. This information needs to be communicated through non- RTP means.

3.5.3 Payload Description Transmission

The payload description must be transmitted in the first RTP packet which contains media samples that require the payload description. After the first packet, the payload description must be retransmitted at a periodic interval until the format of the media samples changes. The maximum retransmission interval should be 1 second, unless packets are being transmitted at less than 1 packet/second in which case the payload description must be transmitted with each packet.

The retransmission interval may be negotiated to an arbitrary value through non-RTP means. Note: This includes the case in which the payload descriptions are never sent over RTP, i.e. a retransmission interval of infinity. In this case the payload descriptions are communicated through some non-RTP means.

A transmitter may send an RTP packet that contains only a payload description and no QuickTime media data. This payload description must be cached by the receiver and used to interpret data that may arrive in the future.

3.6 Loss-intolerant Media Types

Loss-intolerant media types can not be easily handled within the standard RTP framework. Hence, we may need to use some non-RTP techniques to transmit these media types. However, some of the media types, notably Text and Tween media can be sent over RTP by the use of redundant transmissions. (Tween media is used to alter the characteristics of other media streams. For example, Tween samples may contain a series of values that change the volume of an audio stream.) The use of this technique is experimental.

Redundant Transmissions

The redundant transmission technique is one in which the RTP packet is retransmitted multiple times within the duration of the sample. The RTP packet is resent as a whole with the same RTP sequence number, timestamp and other information, i.e. it is an identical packet when seen on the wire. This technique is not bandwidth friendly when used with high bandwidth media types. Hence it will be used only with the low bandwidth media types such as "text" and "tween" media.

The rationale for using the same RTP sequence numbers in the retransmitted packets is as follows: If the sequence numbers were incremented for each of the retransmitted packets we would require an additional field to identify the duplicate samples. In the proposed scheme, the receiver can discard duplicates by simply keeping track of the sequence numbers of the packets received.

The interval between retransmissions depends on the media type and the current congestion situation in the network. This interval can be a simple fixed interval, say 4 retransmissions equally spaced within the duration of the sample, or it could be more complex, say exponentially increasing intervals within the duration of the sample. This specification does not currently recommend a preferred scheme to use for determining the retransmission interval.

4. Open Issues

The following open issues need to be resolved:

- How to handle loss-intolerant media with "key" and "update" samples? Loss-intolerant media samples can be retransmitted multiple times with fixed or variable intervals between transmission. The samples can be classified as key samples and update samples and handled appropriately. Update samples need not be periodically retransmitted. For example, in sprite media, key samples will contain the sprite image and update samples will contain the motion vectors. Whereas, in text media, all samples will be key samples.

- What is the appropriate interval between redundant transmissions for "text" and "tween" media samples?

Acknowledgments

The authors would like to thank Joe Pallas and all the members of the QuickTime Streaming team, Jay Geagan, Andy Grignon, Sylvain Rouze and Kevin Gong for their valuable input in writing this proposal.

References

[1] H. Schulzrinne, et. al., "RTP : A Transport Protocol for Real-Time Applications", IETF RFC 1889, January 1996.

[2] H. Schulzrinne, et. al., "RTP Profile for Audio and Video Conference with Minimal Control", IETF RFC 1890, January 1996.

[3] L. Berc, et. al., "RTP Payload Format for JPEG-compressed Video", IETF RFC 2035, October 1996.

[4] D. Hoffman, et. al., "RTP Payload Format for MPEG1/MPEG2 Video", IETF RFC 2038, October 1996.

[5] T. Turletti, C. Huitema, "RTP Payload Format for H.261 Video Streams", IETF RFC 2032, October 1996.

[6] Apple Computer, Inc., "QuickTime File Format Specification", May 1996.

[7] Apple Computer, Inc., "Inside Macintosh: QuickTime", Addison Wesley Press.

[8] Apple Computer, Inc., "Inside Macintosh: QuickTime Components", Addison Wesley Press.

[9] Apple Computer, Inc., "QuickTime 2.5 Developer Guide", Developer Press.

[10] H. Schulzrinne, et. al., "Real Time Streaming Protocol", IETF Draft ietf-mmusic-rtsp-02.txt, March 24 1994, Expires: August 20 1997.



Change History

2/23/00 - aj - First published
Topics
Previous | Next
Get information on Apple products.
Visit the Apple Store online or at retail locations.
1-800-MY-APPLE

Copyright © 2004 Apple Computer, Inc.
All rights reserved. | Terms of use | Privacy Notice