Tag Archives: RTP Event

DTMF over IP – SIP INFO, Inband & RTP Events

DTMF (Dual Tone Modulated Frequency) aka touch tone, was initially designed to be a faster method of dialling since make-and-break dial pulses were slow and a more efficient method for user input was required switching was becoming digital.

By using two tones DTMF tones, switching equipment could be easily identify the input without complex circuitry, and because it uses two tones the chances of someone accidentally generating the two-tone pair was slim. MF had been used for tandem / trunk signalling inside the network with great success, so DTMF was a standout choice.

SIP was never explicitly designed as a telephony protocol, and as such, it’s support for DTMF wasn’t baked in from the start.

Over time organisations started using DTMF so users could interact with IVRs, Auto Attendants, enter PIN codes and interact with services using their telephone, ideas that wen’t beyond the call setup function originally imagined for DTMF.

Your standard subscriber loop POTS line doesn’t have any out of band signalling for the DTMF, but the carrier switch passes through the audio end to end, and the DTMF tones are carried in that audio, so it’s not a problem.

So when SIP rolled along as the defacto standard for Voice calls over IP, it didn’t have a method for signalling that a DTMF digit had been placed.

Never to fear, neither does a POTS line, so everything will be fine and the tones will just be carried in the media stream like they do on a POTS line.

This was called in-band DTMF. In-band because the DTMF tones are carried in the audio stream like they would if you were to playback those tones on a tape recorder or harmonised whistling.

However along came G.729 and other compressed codecs and suddenly these two tones were lost in compression, so the VoIP world needed a new way to transport DTMF information.

RFC2833 came to fix this problem in 2000, introducing a special RTP packet called an “RTP Event” that denoted a DTMF key-press, which evolved into RFC4733, carrying the DTMF as an RTP event.

Here’s a post I did on RFC2833 DTMF.

For some reason this method of DTMF signalling is still referred to as RFC2833, despite the fact that most implementations are of RFC4733.

But the next problem facing SIP implementers was SIP Proxies had no awareness of the DTMF events, because by definition, a SIP proxy only works with the SIP (signalling) part of the call, not the RTP (media).

So for a device to know when a DTMF keypress happened it’d have to be listening in to the RTP media stream to pickup the RTP events.

The solution that’s considered best practices today actually predates the other two standards. RFC2976 describes using SIP INFO messages to carry payloads. (Link to post on the topic)

In the case of using SIP INFO for payloads, the DTMF info is put into this payload, so this is often used now to carry DTMF info as well as ISUP messaging.

Seems like backwards step, but Proxies can be aware of DTMF messaging and interoperability is in theory enhanced.

The disadvantage is there’s now 3 possible implimentations, DTMF Inband, DTMF in RTP Events, and DTMF in SIP INFO.

Some endpoints use more than one method, some even use all 3. The idea being that it’ll “just work” and won’t need configuring. So when a user presses a digit it plays the tone (in-band), sends an RTP event (RFC4733/2833) and sends a SIP INFO message containing the pressed digit (RFC2976) all at once.

This can cause huge headaches if the switch it’s talking to can recognise more than one type of DTMF signalling it gets multiple inputs, causing jumping through IVRs and menus.

If only we had one universal standard…

See also:

RFC2976 / RFC6086 – SIP INFO

RFC2833 – RTP Events

RTP – More than you wanted to Know

RFC2833 – RTP Events

RFC2833 was designed to carry DTMF signalling, other tone signals and telephony events in RTP packets.

This was later superseded by RFC4733, but everyone still referrers to this protocol as RFC2833, so I will too.

RFC2833 a special RTP payload designed to carry DTMF signalling information, so it operates on the same source / destination ports as the RTP signal and you’ll see it mixed in there when viewing packet captures.

It uses RTP’s Synchronisation Source Identifiers to identify the stream, and uses the next RTP sequence numbers, so it relies on RTP to sort pretty much everything.

The RTP Event itself, contains an Event ID header (called “event” in the spec), End of Event flag, Reserved flag, Volume header and Event Duration header.

Event ID (event)

The Event header contains the event that is being conveyed. For DTMF this would be the numeral 8 (8) for DTMF Eight.

DTMF named events

End of Event

The End of Event (Referred to as E in the RFC) flag is set to 1 if the transmitted packet is the end of an RTP event.

This allows for a key press to span over multiple packets, with the end of the key-press (key release) denoted by this flag.

Reserved Flag

The reserved flag (R) is reserved for future use, and will just be set to 0.


This is only used for DTMF digits and denotes the volume of the tone in dB from 0 to -36 dBm0.

Event Duration

The event duration tag. When a DTMF keypress is split over multiple RTP Event packets, the first will start at 0 and then this will count up by the time incremented in the timestamp.

Analysing in Wireshark

By using the display filter “rtpevent” you can see all the RTP events for you call.

Each DTMF event will contain multiple packets, with the total number depending on how long the keypress is and packetization timers.

When they key is pressed by the user, an RTP event with a duration of 0 and the Event ID of the DTMF digit is sent.

For as long as the digit is held, subsequent packets with a totalled event duration will keep being sent,

Finally when the key is released an RTP Event with the “End of Event” header set to True will be sent to mark the end of the RTP Event.

See also:

DTMF over IP – SIP INFO, Inband & RTP Events

RFC2976 / RFC6086 – SIP INFO