Question 1. Is Rtp A Transport Protocol Or A Kind Of Application Protocol?
RTP has important properties of a transport protocol:
it runs on end systems, it provides demultiplexing. It differs from transport protocols like TCP in that it (currently) does not offer any form of reliability or a protocol-defined flow/congestion control. However, it provides the necessary hooks for adding reliability, where appropriate, and flow/congestion control. Some like to refer to this property as application-level framing (see D. Clark and D. Tennenhouse, “Architectural considerations for a new generation of protocols”, SIGCOMM’90, Philadelphia). RTP so far has been mostly implemented within applications, but that has no bearing on its role. TCP is still a transport protocol even if it is implemented as part of an application rather than the operating system kernel.
Question 2. Rtp Does Not Ensure Real-time Delivery. So How Come It Is Called A Real-time Protocol?
No end-to-end protocol, including RTP, can ensure in-time delivery. This always requires the support of lower layers that actually have control over resources in switches and routers. RTP provides functionality suited for carrying real-time content, e.g., a timestamp and control mechanisms for synchronizing different streams with timing properties.
Question 3. Is Rtp An Unreliable Protocol? Are There Any Mechanisms Provided For Error Recovery In Rtp?
As currently defined, RTP does not define any mechanisms for recovering for packet loss. Such mechanisms are likely to be highly dependent on the packet content. For example, for audio, it has been suggested to add low-bit-rate redundancy, offset in time. For other applications, retransmission of lost packets may be appropriate. (The H.261 RTP payload definition offers such a mechanism.) This requires no additions to RTP. RTP probably has the necessary header information (like sequence numbers) for some forms of error recovery by retransmission.
Question 4. Can Rtp Run Over Ipv6? Atm?
Yes. RTP contains no specific assumptions about the capabilities of the lower layers, except that they provide framing. It contains no network-layer addresses, so that RTP is not affected by addressing changes. Any additional lower-layer capabilities such as security or quality-of-service guarantees can obviously be used by applications employing RTP. There are several implementations of video tools that run RTP directly over AAL5 (T. Braun) and recent efforts to define the carriage of RTP over AAL2 and AAL5. It should be noted that the RTCP CNAME field is currently based on the assumption that hosts have Internet-style domain names.
Question 5. Can Rtp Be Used In Asymmetric Networks?
In asymmetric networks, the bandwidth in one direction, typically from the user to the Internet, is significantly lower than in the other. These networks include ADSL, cable modems and satellite distribution. RTP can be used readily, but it may be necessary to have only data senders send RTCP messages. These RTCP messages are useful to allow inter-media synchronization and identify the content of the media stream.
Question 6. Why Doesn’t Rtp Have A Length Field?
RTP does not contain a length field, that is, it assumes that framing is performed by the underlying protocol and that only one RTP packet is to be carried in one PDU of the underlying protocol. This is the typical application with UDP (or AAL5) as the underlying protocol. Since most applications currently envisioned do not need framing, it would be a waste of processing and bandwidth to add one. This is covered in detail in the section RTP over Network and Transport Protocols of the spec.
If RTP is used with a protocol that is not message-based (e.g., TCP) or if it is desirable to carry several RTP packets in one lower-layer PDU (e.g., for aggregation of streams), it is trivial to define a profile that prefixes the RTP header by a 16 or 32-bit length field, depending on the desired tradeoff between overhead and maintaining word alignment.
Question 7. Does Rtp Have A Fixed Packetization Interval?
Some implementations assume that packet audio is sent with a particular packetization interval, e.g., 20 ms. This is wrong. While RFC 1890 recommends certain values and SDP allows to express a preference, implementations need to be able to handle all reasonable values. There is no constraint that G.711 or other sample-based formats is conveyed in multiples of a certain unit. Thus, an RTP packet with 123 samples of G.711 is perfectly legitimate and needs to be handled appropriately.
Question 8. How Does Padding Work?
Since the underlying transport unit defines the end of the packet, the application can always locate the last byte of the (say, UDP) packet and look there for the number of padding bytes.
Question 9. Practically Speaking, How Is The Timestamp Computed?
For audio, the timestamp is incremented by the packetization interval times the sampling rate. For example, for audio packets containing 20 ms of audio sampled at 8,000 Hz, the timestamp for each block of audio increases by 160, even if the block is not sent due to silence suppression. Also, note that the actual sampling rate will differ slightly from this nominal rate, but the sender typically has no reliable way to measure this divergence.
For video, time clock rate is fixed at 90 kHz. The timestamps generated depend on whether the application can determine the frame number or not. If it can or it can be sure that it is transmitting every frame with a fixed frame rate, the timestamp is governed by the nominal frame rate. Thus, for a 30 f/s video, timestamps would increase by 3,000 for each frame, for a 25 f/s video by 3,600 for each frame. If a frame is transmitted as several RTP packets, these packets would all bear the same timestamp. If the frame number cannot be determined or if frames are sampled aperiodically, as is typically the case for software codecs, the timestamp has to be computed from the system clock (e.g., gettimeofday()).
Question 10. In A Multimedia Conference, Are The Initial Timestamp Values Related?
No, initial time stamp values are picked randomly and independently for each RTP stream. (This is more or less unavoidable if different media types are generated by independent applications, whether these applications reside on the same host or not.) Synchronization (such as lip sync) between different media is performed by receivers through the NTP timestamps in the RTCP sender reports. This timestamp provides a common time reference that associates a media-specific RTP timestamp with the common “wallclock” time shared across media. The mechanism how end systems synchronize different media is not prescribed by RTP, however, a workable approach is to periodically exchange messages between applications to indicate what delay each application would impose on the stream (including any media decoding delays) if it were not to synchronize and then have all applications choose the maximum of these delays.
Question 11. What Are The Roles Of The Rtp Timestamp And Sequence Numbers?
The timestamp is used to place the incoming audio and video packets in the correct timing order (playout delay compensation). The sequence number is mainly used to detect losses. Sequence numbers increase by one for each RTP packet transmitted, timestamps increase by the time “covered” by a packet. For video formats where a video frame is split across several RTP packets, several packets may have the same timestamp. In some cases such as carrying DTMF (touch tone) data (RFC 2833), RTP timestamps may not be monotonic.
Question 12. What Are The Different Clocks And How Are They Synchronized?
RFC 3550 specifies one media-timestamp in the RTP data header and a mapping between such timestamp and a globally synchronized clock, carried as RTCP timestamp mappings.
The NTP timestamps in the SR are assumed to be synchronized between all media senders within a single session. If the media sources come from the same network source, this is obviously not an issue. Receiver(s) synchronize to the sender, the only solution feasible for multicast.
Experience has shown that all other cross-media, cross-host schemes end up doing clock synchronization, usually inferior to NTP and application-specific.
Question 13. What’s The Marker Bit Good For?
For voice packets, the marker bits indicates the beginning of a talkspurt. Beginning of talkspurts are good opportunities to adjust the playout delay at the receiver to compensate for differences between the sender and receiver clock rates as well as changes in the network delay jitter. Packets during a talkspurt need to played out continuously, while listeners generally are not sensitive to slight variations in the durations of a pause.
The marker bit is a hint; the beginning of a talkspurt can also be computed by comparing the difference in timestamps and sequence numbers between two packets, assuming the timestamp clock rate is known.
Question 14. What Is The Sender Packet Count And Byte Count Used For?
They are not needed for loss computation; the sequence number fields are used for that to avoid round-off errors. They may be used to compute the sender packet and byte rate.
Question 15. What Is The Rtp Timestamp In The Rtcp Sender Report Used For?
The RTP timestamp and NTP timestamps form a pair that identify the absolute time of a particular sample in the stream. For example, if the RTCP sender report contains an RTP timestamp of 1234 and an NTP timestamp indicating February 3, 10:14:15, it means that sample 1234 in the media stream occured exactly on February 3, 10:14:15.
Question 16. How Is The Jitter Computed?
If several packets, say, within a video frame, bear the same timestamp, it is advisable to only use the first packet in a frame to compute the jitter. (This issue may be addressed in a future version of the specification.)
Jitter is computed in timestamp units. For example, for an audio stream sampled at 8,000 Hz, the arrival time measured with the local clock is converted by multiplying the seconds by 8,000.
Steve Casner wrote:
For encodings such as MPEG that transmit data in a different order than it was sampled, this adds noise into the jitter calculation. I have heard handwavy arguments that this factor can be calculated out given that you know the shape of the noise, but my math isn’t strong enough for that.
In many of the cases that we care about, the jitter introduced by MPEG will be small enough that when the network jitter is of the same order we don’t have a problem anyway.
There is another problem for video in that all of the packets of a frame have the same timestamp because the whole frame is sampled at once. However, the dispersion in time of those packets really is all part of the network transfer process that the receiver must accommodate with its buffer.
It has been suggested that jitter be calculated only on the first packet of a video frame, or only on “I” frames for MPEG. However, that may color the results also because those packets may see transit delays different than the following packets see.
The main point to remember is that the primary function of the RTP timestamp is to represent the inherent notion of real time associated with the media. It also turns out to be useful for the jitter measure, but that is a secondary function.
The jitter value is not expected to be useful as an absolute value. It is more useful as a means of comparing the reception quality at two receiver or comparing the reception quality 5 minutes ago to now
Question 17. What Is The Session Bandwidth?
First, it is most certainly not the link bandwidth. This would not scale, as then a large number of sessions could saturate the link with RTCP traffic, even if each used just 5% of the link bandwidth for RTCP. Secondly, the concept of link bandwidth is ill-defined in a heterogeneous network.
The session bandwidth is the nominal data bandwidth plus the IP, UDP and RTP headers (40 bytes). For example, for 64 kb/s PCM audio packetized in 20 ms increments, the session bandwidth would be (160 + 40) / 0.02 bytes/second or 80 kb/s. If there are multiple senders, the sum of their individual bandwidths is used.
The session bandwidth is typically defined out-of-band, e.g., in a session announcement protocol, based on reasonable estimates of the number of concurrent senders and their average bandwidth. Distributed and consistent on-line estimation of the session bandwidth may be hard as the number of senders and their bandwidth changes. The absolute value is less important than that all participants agree on a common value. (After all, there is nothing special about choosing the RTCP bandwidth to be 5% of the session bandwidth, it just has to be agreed upon by all participants to avoid timing out members prematurely.)
Question 18. What Is The Use Of Rtcp For Two-party Calls?
Since the cost of sending RTCP is minimal (about one packet every 5 seconds), it makes sense to send RTCP even for point-to-point connections:
With RTCP, both sides know how well the other side is receiving audio and video; this is useful, since degraded quality can have any number of reasons beyond network loss, delay and jitter. A particular use is when calling technical support: the tech support person can observe the network performance at the remote end.
- RTCP is necessary for synchronizing audio and video streams.
- For audio with silence suppression, RTCP is useful as a liveness indication.
- The SDES information is useful for user interfaces.
- Many applications will (want to) support both unicast and multicast, so that the additional implementation complexity is zero.
Question 19. How Do I Register An Rtp Payload Type?
See the description, drawn from RFC 1890 (with some practical comments).
Question 20. What Are Dynamic Payload Types?
Dynamic payload types are described in the RTP A/V Profile. Unlike static payload types, dynamic payload types are not assigned in the RTP A/V Profile or by IANA. They map an RTP payload type to an audio and video encoding for the duration of a session. Different members of a session could, but typically do not, use different mappings. Dynamic payload types use the range 96 to 127. They are assigned by means outside of the RTP profile or protocol specification, including
session descriptions like SDP (using the a:rtpmap parameter), used in announcements and invitations (e.g., SIP);
m=audio 12345 RTP/AVP/121
other signaling protocols (however, H.245 does not appear to have a mechanism for doing this, at least not for non-ITU protocols).
Note that a number of encodings are described in the RTP A/V profile which do not have a static (permanent) payload type. The RTP A/V Profile defines names for encodings which may be used by SDP or other mechanisms to specify the mapping. Encodings may also be identified by object identifiers or other names.
Since the space for payload types is limited, only very common encodings should be assigned static types. These are typically audio and video encodings “blessed” by international standardization bodies, such as the G. series of ITU-T audio encodings. The RTP A/V Profile defines a set of criteria for making static assignments.
Question 21. If I’m Using H.323 Or Other Set-up Protocol, Can I Ignore The Rtp Payload Type (pt) Field?
An application must never just play a packet without inspecting its payload type, even if a single payload type has been negotiated via H.245 or similar protocols. New mechanisms, including transmission of DTMF digits (RFC 2833), comfort noise indication, forward error correction using redundant data, switching of encodings to take into account network conditions may conveniently use the PT to indicate special packets, which an end application can ignore, if desired, ensuring backward compatibility. But this assumption is violated if an application blindly plays back all packets regardless of PT. Also, in multicast environments, it is unlikely that every sender will use the same payload type.
Question 22. Should The Rtp Payload Type (pt) Field Be Used For Multiplexing Different Streams?
It has been suggested that in some environments (such as RTP over AAL5) that lack lower-layer muxing abilities, the RTP payload type (PT) field be used to differentiate streams originating from different sources. This is a fundamentally bad idea and violates the letter and intent of the specification. It makes use of multiple PTs in a single stream difficult (see previous question). It is also unnecessary, as the SSRC was designed for distinguishing several sources.
Question 23. Should The Rtp Ssrc Be Used For Demultiplexing Different Streams For The Same Rtp Session?
The RTP SSRC is meant to label streams from different sources, that is, each sender in a conference has its own SSRC. It has been suggested to have a single source, using the same RTP session (identified by source and destination addresses and ports), send different media, such as an audio and video stream, using different SSRCs.
This is generally a bad idea for the following reasons:
An RTP mixer normally combines all the SSRCs it receives on an RTP session according to the composition method that is appropriate for that session (e.g., mixing for audio). If multiple media are sent on one session, then the SSRCs must be segregated per medium based on external information. That gets complicated with sources coming from multiple places. It is similarly more complicated for and end node receiver to handle streams coming from multiple sources to the same RTP session if some of those sources don’t all get fed to the same compositor (mixer, selector, whatever).
Carrying multiple media in one RTP session precludes the use of different network paths or network resource allocations if appropriate. For the typical synchronized audio/video stream one may not want different paths, but it is not hard to imagine situations where one medium should go via a low-bandwidth, low-delay terrestrial path while another can tolerate the longer delay of a satellite path in order to get higher bandwidth.
Carrying multiple media in one RTP session precludes reception of a subset of the media if desired, for example just audio if video would exceed the available bandwidth. This is not an issue for unicast since that choice of media would be controlled by the exchange with the sender, but it is valuable for multicast with heterogeneous receivers.
Carrying multiple media in one RTP session precludes receiver implementations that use separate processes for the different media, whereas using separate RTP sessions permits either single- or multiple-process implementations. Consider the development of “desk area networks” at MIT, ISI and other places in which the display and the speaker may have different IP addresses. This is an instance of the general philosophy of demultiplexing at the lowest level possible.
Also, making the SSRC fixed is a problem in the multicast case because collision resolution might require changing the SSRC id.(contributed by Steve Casner)
Question 24. Do Receivers Need Their Own Ssrc Identifiers?
Yes, all participants in an RTP session have SSRC values, since they are needed in receiver reports.
Question 25. Why Can’t We Just Use Tcp For Audio And Video?
For delivering audio and video for playback, TCP may be appropriate. Also, with sufficiently long buffering and adequate average throughput, near-real-time delivery using TCP can be successful, as practiced by the Netscape WWW browser. TCP may often run over highly lossy networks (e.g., the German X.25 network) with acceptable throughput, even though the uncompensated losses would make audio or video communication impossible.
However, for real-time delivery of audio and video, TCP and other reliable transport protocols such as XTP are inappropriate. The three main reasons are:
Reliable transmission is inappropriate for delay-sensitive data such as real-time audio and video. By the time the sender has discovered the missing packet and retransmitted it, at least one round-trip time, likely more, has elapsed. The receiver either has to wait for the retransmission, increasing delay and incurring an audible gap in playout, or discard the retransmitted packet, defeating the TCP mechanism. Standard TCP implementations force the receiver application to wait, so that packet losses would always yield increased delay. Note that a single packet lost repeatedly could drastically increase delay, which would persist at least until the end of talkspurt.
TCP cannot support multicast.
The TCP congestion control mechanisms decreases the congestion window when packet losses are detected (“slow start”). Audio and video, on the other hand, have “natural” rates that cannot be suddenly decreased without starving the receiver. For example, standard PCM audio requires 64 kb/s, plus any header overhead, and cannot be delivered in less than that. Video could be more easily throttled simply by slowing the acquisition of frames at the sender when the transmitter’s send buffer is full, with the corresponding delay. The correct congestion response for these media is to change the audio/video encoding, video frame rate, or video image size at the transmitter, based, for example, on feedback received through RTCP receiver report packets.
An additional small disadvantage is that the TCP and XTP headers are larger than a UDP header (40 bytes for TCP and XTP 3.6, 32 bytes for XTP 4.0, compared to 8 bytes). Also, these reliable transport protocols do not contain the necessary timestamp and encoding information needed by the receiving application, so that they cannot replace RTP. (They would not need the sequence number as these protocols assure that no losses or reordering takes place.)
While LANs often have sufficient bandwidth and low enough losses not to trigger these problems, TCP does not offer any advantages in that scenario either, except for the recovery from rare packet losses. Even in a LAN with no losses, the TCP slow start mechanism would limit the initial rate of the source for the first few round-trip times.
Question 26. Can’t We Just Use Xtp?
Many of the arguments parallel those in the previous section. The question of the relationship of RTP and XTP appears to arise frequently. (This may simply be due to the word ‘transport’ in both protocol names.) However, XTP and RTP are not replacements for each other. XTP is designed as a general, configurable network and transport protocol for both reliable and unreliable data communications. RTP has no reliability mechanisms (although these could be added if desired for specific applications) and no flow control like the rate control in XTP. RTP is not intended for regular, reliable data transfer (where TCP or XTP might be used instead). For real-time data, where retransmission is usually not possible due to timing constraints, XTP would have to disable retransmission. Flow/congestion control for real-time data is most likely inappropriate as the rate of such sources is inherently given and not modifiable on the time-scale of transport-protocol flow control, as explained in the previous section. It should be noted that RTP supports mechanisms that allow a form of congestion control on longer time scales, e.g., by modifying the source encoder if network congestion is detected.
RTP has no protocol state by itself and can thus be used over either connection-less networks, such as IP/UDP, or connection-oriented networks, such as XTP, ST-II or ATM (AAL3/4 or AAL5). Many real-time multimedia applications use multicast with a large fan-out, e.g., several hundred to thousands for a lecture or concert. Connection-oriented protocols like XTP have difficulty scaling to such a large number of receivers.
XTP does not offer timing or content type (media) information, and thus would need these services, as offered by RTP. XTP provides no RTP-like direct feedback of the received quality-of-service, and thus, again, would have to “import” these from another protocol. Looking at existing applications using XTP for real-time services confirms that they need to add a layer similar in content to the RTP data part “between” XTP and the actual media.
Question 27. How Should Rtp Sessions Be Played Back?
Since RTCP packets contain absolute time information, a recorded session cannot simply be played back by time-shifting the whole recorded session. One approach plays back the data packets with their original time stamps, with re-normalized timing. SDES information other than NOTE items can be gathered for each source and regenerated as in a “live” session. NOTE SDES items need to be inserted at the appropriate instant in the playback as they are allowed to change.
Question 28. What Are Some Of The Differences Between The Vat Protocol And Rtp?
The VAT protocol was originally implemented in the VAT audio tool and subsequently also in other audio tools such as NeVoT. The VAT protocol is now obsolete and should not be used or implemented.
The VAT header format is only described in header files. (See the VAT and NeVoT sources for details.) Many aspects of RTP and the VAT protocol are similar, but RTP improves upon the VAT protocol in a number of ways:
The VAT protocol was designed for audio only, while RTP is specified for audio and video and may be suitable for other real-time applications.
RTP is designed to be protocol-independent and can be used with non-IP protocols (ATM AAL5, for example) as well as, say, IPv6.
RTP source identification simplifies the use of mixers and translators.
RTP has a number of features that simplify use of application-level encryption (padding, etc.).
The RTP header is extensible, should the need arise in the future.
The RTP header has a sequence number which simplifies accurate loss detection and measurement and the handling of images transmitted in several packets.
The RTCP SDES packets contain additional information that simplify tracing of misbehaving sources, e.g., their email address or telephone number.
The RTCP SDES CNAME items simplify the construction of multimedia application from independent media agents.
RTCP sender and receiver reports allow the implementation of adaptive applications, that is, applications where senders scale their bandwidth consumption based on network load.
RTCP sender and receiver reports allow monitoring of the quality of service within, say, a multimedia conference.
Question 29. What Are The Differences Between Rtp Version 1 And 2?
Version 1 is of historical interest only. Applications should not be written for it. RTP version 2 is not backwards compatible with version 1. If you care, you can find a definition of version 1 in an old Internet draft.
Question 30. What About Firewalls?
H.323 TCP 1720
H.235 TCP ephemeral, > 1024
Question 31. What Is The Quality Of Audio Codec X?
See separate summary with audio samples.
Question 32. Are All Audio Codecs Patented?
Most older, higher-bitrate codecs are not subject to patent protection. However, G.723, G.729.1 and GSM are covered by various patents. For example, U.S. patent 4,752,956, Digital speech coder with baseband residual coding modifies coding using short term fine structure speech data produced by analysers within encoder-multiplexers applies to GSM and is assigned to Philips.
Question 33. Are There Other Efforts In Using The Internet For Real-time Audio And Video?
Too many, some may say. vat versions 3.4 and earlier, one of the early (recent) Internet audio applications, uses mostly the same audio encodings as specified in the RTP profile, but a different protocol. There are also a number of Internet telephony applications that usually only operate on PCs and in unicast mode. There are initial efforts to interconnect the public switched telephone network and the Internet.
CuSeeMe (for Windows PC and the Macintosh) is a combined audio and video tool using reflectors rather than IP-level multicast.
The Internet Telephony Consortium maintains a listing of standards and related efforts.
Question 34. Is There An Rtp Library Or Kernel Implementation?
RTP (in particular, the data part) is tightly coupled to the application, so that a kernel implementation makes little sense. A number of people have developed libraries that implement RTP and RTCP (see listing). The sources to NeVoT, rtpdump, vat, rat and vic also contain RTP and RTCP processing modules which should be usable in other applications with minor modifications. Note also that the specification itself contains numerous code fragments. (Most of the other applications are using older versions of RTP and thus should not be relied upon for developments.)
The Java Media Framework (JMF), a Java API, also supports RTP and RTCP.
There is no standard API for RTP.
Voice Over Internet Protocol (VOIP) Interview Questions
Internet Protocol version 6 (IPv6) Tutorial
Veritas Volume Manager (VVM or VxVM) Interview Questions
Internet Protocol version 4 (IPv4) Tutorial
Simple Mail Transfer Protocol (SMTP) Interview Questions
Internet Protocol version 6 (IPv6) Interview Questions
Internet Protocol version 4 (IPv4) Interview Questions
Voice Over Internet Protocol (VOIP) Interview Questions
Spanning Tree Protocol (STP) Interview Questions
Border Gateway Protocol (BGP) Interview Questions
Veritas Volume Manager (VVM or VxVM) Interview Questions
Enhanced Interior Gateway Routing Protocol (EIGRP) Interview Questions
Post Office Protocol (POP) Interview Questions
Simple Mail Transfer Protocol (SMTP) Interview Questions
Internet Protocol version 6 (IPv6) Interview Questions
Internet Protocol version 4 (IPv4) Interview Questions