We’ve talked a bit in the past few posts about keys, K and all it’s derivatives, such as Kenc, Kint, etc.
Each of these is derived from our single secret key K, known only to the HSS and the USIM.
To minimise the load on the HSS, the HSS transfers some of the key management roles to the MME, without ever actually revealing what the secret key K actually is to the MME.
This means the HSS is only consulted by the MME when a UE/Terminal attaches to the network, and not each time it attaches to different cell etc.
When the UE/Terminal first attaches to the network, as outlined in my previous post, the HSS also generates an additional key it sends to the MME, called K-ASME.
K-ASME is the K key derived value generated by the HSS and sent to the MME. It sands for “Access Security Management Entity” key.
When the MME has the K-ASME it’s then able to generate the other keys for use within the network, for example the Kenb key, used by the eNodeB to generate the keys required for communications.
The USIM generates the K-ASME itself, and as it’s got the same input parameters, the K-ASME generated by the USIM is the same as that generated by the HSS.
The USIM can then give the terminal the K-ASME key, so it can generate the same Kenb key required to generate keys for complete communications.
We’ve already touched on how subscribers are authenticated to the network, how the network is authenticated to subscribers.
Those functions are done “in the clear” meaning anyone listening can get a copy of the data transmitted, and responses could be spoofed or faked.
To prevent this, we want to ensure the data is ciphered (encrypted) and the integrity of the data is ensured (no one has messed with our packets in transmission or is sending fake packets).
Ciphering of Messages
Before being transmitted over the Air interface (Uu) each packet is encrypted to prevent eavesdropping.
This is done by taking the plain text data and a ciphering sequence for that data of the same length as the packet and XORing two.
The terminal and the eNodeB both generate the same ciphering sequence for that data.
This means to get the ciphered version of the packet you simply XOR the Ciphering Sequence and the Plain text data.
To get the plain text from the ciphered packet you simply XOR the ciphered packet and ciphering sequence.
The Ciphering Sequence is made up of parts known only to the Terminal and the Network (eNB), meaning anyone listening can’t deduce the same ciphering sequence.
The Ciphering Sequence is derived from the following input parameters:
Key Kenc
Packet Number
Bearer Number
Direction (UL/DL)
Packet Size
Is is then ciphered using a ciphering algorithm, 3GPP define two options – AES or SNOW 3G. There is an option to not generate a ciphering sequence at all, but it’s not designed for use in production environments for obvious reasons.
Ciphering Sequences are never reused, the packet number increments with each packet sent, and therefore a new Cipher Sequence is generated for each.
Someone listening to the air interface (Uu) may be able to deduce packet size, direction and even bearer, but without the packet number and secret key Kenc, the data won’t be readable.
Data Integrity
By using the same ciphering sequence & XOR process outlined above, we also ensure that data has not been manipulated or changed in transmission, or that it’s not a fake message spoofing the terminal or the eNB.
Each frame contains the packet and also a “Message Authentication Code” or “MAC” (Not to be confused with media access control), a 32 bit long cryptographic hash of the contents of the packet.
The sender generates the MAC for each packet and appends it in the frame,
The receiver looks at the contents of the packet and generates it’s own MAC using the same input parameters, if the two MACs (Generated and received) do not match, the packet is discarded.
This allows the receiver to detect corrupted packets, but does not prevent a malicious person from sending their own fake packets,
To prevent this the MAC hash function requires other input parameter as well as the packet itself, such as the secret key Kint, packet number, direction and bearer.
By adding this we ensure that the packet was sourced from a sender with access to all this data – either the terminal or the eNB.
In my last post we discussed how the network authenticated a subscriber, now we’ll look at how a subscriber authenticates to a network. There’s a glaring issue there in that the MME could look at the RES and the XRES and just say “Yup, OK” even if the results differed.
To combat this LTE networks have mutual authentication, meaning the network authenticates the subscribers as we’ve discussed, and the subscribers authenticate the network.
To do this our HSS will take the same random key (RAND) we used to authenticate the subscriber, and using a different cryptographic function (called g) take the RAND, the K value and a sequence number called SQN, and using these 3 inputs, generate a new result we’ll call AUTN.
The HSS sends the RAND (same as RAND used to authenticate the subscriber) and the output of AUTN to the MME which forwards it to the eNB to the UE which passes the RAND and AUTH values to the USIM.
The USIM takes the RAND and the K value from the HSS, and it’s expected sequence number. With these 3 values it applies the cryptographic function g generates it’s own AUTN result.
If it matches the AUTN result generated by the HSS, the USIM has authenticated the network.
The USIM and the HSS contain the subscriber’s K key. The K key is a 128 bit long key that is stored on the subscriber’s USIM and in the HSS along with the IMSI.
The terminal cannot read the K key, neither can the network, it is never transmitted / exposed.
When the Terminal starts the attach procedure, it includes it’s IMSI, which is sent to the MME.
The MME then sends the the HSS a copy of the IMSI.
The HSS looks up the K key for that IMSI, and generates a random key called RAND.
The HSS and runs a cryptographic function (called f) using the input of RAND and K key for that IMSI, the result is called XRES (Expected result).
The HSS sends the output of this cryptographic function (XRES), and the random value (RAND) back to the MME.
The MME forwards the RAND value to the USIM (via eNB / Terminal), and stores a copy of the expected output of the cryptographic function.
The USIM take the RAND and the K key and performs the same cryptographic function the HSS did on it with the input of the K key and RAND value to generate it’s own result (RES).
The result of this same function (RES) is then sent from the USIM to the terminal which forwards it to the MME.
The MME and comparing the result the HSS generated (XRES) with the result the USIM generated. (RES)
If the two match it means both the USIM knows the K key, and is therefore the subscriber they’re claiming to be.
If the two do not match the UE is refused access to the network.
I’ve been working on private LTE recently, and one of the first barriers you’ll hit will be authentication.
LTE doesn’t allow you to just use any SIM to authenticate to the network, but instead relies on mutual authentication of the UE and the network, so the Network knows it’s talking to the right UE and the UE knows it’s talking to the right network.
So because of this, you have to have full control over the SIM and the network. So let’s take a bit of a dive into USIMs.
So it’s a SIM card right?
As a bit of background; the ever shrinking card we all know as a SIM is a “Universal integrated circuit card” – a microcontroller with it’s own OS that generally has the ability to run Java applets.
One of the Java applets on the card / microcontroller will be the software stack for a SIM, used in GSM networks to authenticate the subscriber.
For UMTS and LTE networks the card would have a USIM software stack allowing it to act as a USIM, the evolved version of the SIM.
Because it’s just software a single card can run both a USIM and SIM software stack, and most do.
As I’m building an LTE network we’ll just talk about the USIM side of things.
USIM’s role in Authentication
When you fire up your mobile handset the baseband module in it communicates with the USIM application on the card.
When it comes time to authenticate to the network, and authenticate the network itself, the baseband module sends the provided challenge information from the network to the USIM which does the crypto magic to generate responses to the authentication challenges issued by the network, and the USIM issues it’s own challenges to the network.
The Baseband module provides the ingredients, but the USIM uses it’s secret recipe / ingredients combo, known only to the USIM and HSS, to perform the authentication.
Because the card challenges the network it means we’ve got mutual authentication of the network.
This prevents anyone from setting up their own radio network from going all Lionel Ritche and saying “Hello, is it me you’re looking for” and having all the UEs attach to the malicious network. (Something that could be done on GSM).
It’s worth noting too that because the USIM handles all this the baseband module, and therefore the mobile handset itself, doesn’t know any of the secret sauce used to negotiate with the network. It just gets the challenge and forwards the ingredients down to the USIM which spits back the correct response to send, without sharing the magic recipe.
This also means operators can implement their own Crypto functions for f and g, so long as the HSS and the USIM know how to generate the RES and AUTN results, it’ll work.
What’s Inside?
Let’s take a look at the information that’s stored on your USIM:
All the GSM stuff for legacy SIM application
Generally USIMs also have the ability to operate as SIMs in a GSM network, after all it’s just a different software stack. We won’t touch on GSM SIMs here.
ICCID
Because a USIM is just an application running on a Universal Integrated Circuit Card, it’s got a ICCID or Universal Integrated Circuit Card ID. Generally this is the long barcode / string of numbers printed on the card itself.
The network generally doesn’t care about this value, but operators may use it for logistics like shipping out cards.
PIN & PUK
PINs and PUKs are codes to unlock the card. If you get the PIN wrong too many times you need the longer PUK to unlock it.
These fields can be written to (when authenticated to the card) but not read directly, only challenged. (You can try a PIN, but you can’t see what it’s set too).
As we mentioned before the terminal will ask the card if that’s correct, but the terminal doesn’t know the PIN either.
IMSI
Each subscriber has an IMSI, an International Mobile Subscriber Identity.
IMSIs are hierarchical, starting with 3 digit Mobile Country Code MCC, then the Mobile Network Code (MNC) (2/3 digits) and finally a Mobile Subscription Identification Number (MSIN), a unique number allocated by the operator to the subscribers in their network.
This means although two subscribers could theoretically have the same MSIN they wouldn’t share the same MNC and MCC so the ISMI would still be unique.
The IMSI never changes, unless the subscriber changes operators when they’ll be issued a new USIM card by the new operator, with a different IMSI (differing MNC).
The MSIN isn’t the same as the phone number / MSISDN Number, but an IMSI generally has a MSISDN associated with it by the network. This allows you to port / change MSISDN numbers without changing the USIM/SIM.
K – Subscriber Key
Subscriber’s secret key known only to the Subscriber and the Authentication Center (AuC/ HSS).
All the authentication rests on the principle that this one single secret key (K) known only to the USIM and the AuC/HHS.
OP – Operator Code
Operator Code – same for all SIMs from a single operator.
Used in combination with K as an input for some authentication / authorisation crypto generation.
Because the Operator Code is common to all subscribers in the network, if this key were to be recovered it could lead to security issues, so instead OPc is generally used.
OPc – Operator Code (Derived)
Instead of giving each USIM the Operator Code a derived operator code can be precomputed when the USIM is written with the K key.
This means the OP is not stored on the USIM.
OPc=Encypt-Algo(OP,Key)
PLMN (Public Land Mobile Network)
The PLMN is the combination of MCC & MNC that identifies the operator’s radio access network (RAN) from other operators.
While there isn’t a specific PLMN field in most USIMs it’s worth understanding as several fields require a PLMN.
HPLMNwAcT (HPLMN selector with Access Technology)
Contains in order of priority, the Home-PLMN codes with the access technology specified.
This allows the USIM to work out which PLMN to attach to and which access technology (RAN), for example if the operator’s PLMN was 50599 we could have:
50599 E-UTRAN
50599 UTRAN
To try 4G and if that fails use 3G.
In situations where operators might partner to share networks in different areas, this could be set to the PLMN of the operator first, then it’s partnered operator second.
OPLMNwACT (Operator controlled PLMN selector with Access Technology)
This is a list of PLMNs the operator has a roaming agreement with in order of priority and with the access technology.
An operator may roam to Carrier X but only permit UTRAN access, not E-TRAN.
FEHPLMN (Equivalent HPLMN)
Used to define equivalent HPMNs, for example if two carriers merge and still have two PLMNs.
FPLMN (Forbidden PLMN list)
A list of PLMNs the subscriber is not permitted to roam to.
HPPLMN (Higher Priority PLMN search period)
How long in seconds to spend between each PLMN/Access Technology in HPLMNwAcT list.
ACC (Access Control Class)
The ACC allows values from 0-15, and determines the access control class of the subscriber.
In the UK the ACC values is used to restrict civilian access to cell phone networks during emergencies.
Ordinary subscribers have ACC numbers in the range 0 – 9. Higher priority users are allocated numbers 12-14.
During an emergency, some or all access classes in the range 0 – 9 are disabled.
This means service would be could be cut off to the public who have ACC value of 0-9, but those like first responders and emergency services would have a higher ACC value and the network would allow them to attach.
AD (Administrative Data)
Like the ACC field the AD field allows operators to drive test networks without valid paying subscribers attaching to the network.
The defined levels are:
’00’ normal operation.
’80’ type approval operations.
’01’ normal operation + specific facilities.
’81’ type approval operations + specific facilities.
’02’ maintenance (off line).
’04’ cell test operation.
GID 1 / 2 – Group Identifier
Two group identifier fields that allow the operator to identify a group of USIMs for a particular application.
SPN (Service Provider Name)
The SPN is an optional field containing the human-readable name of the network.
The SPN allows MVNOs to provide their own USIMs with their name as the operator on the handset.
ECC (Emergency Call Codes)
Codes up to 6 digits long the subscriber is allowed to dial from home screen / in emergency / while not authenticated etc.
MSISDN
Mobile Station International Subscriber Directory Number. The E.164 formatted phone number of the subscriber.
This is optional, as porting may overwrite this, so it doesn’t always match up.
Caller-ID spoofing has been an issue in most countries since networks went digital.
SS7 doesn’t provide any caller ID validation facilities, with the assumption that everyone you have peered with you trust the calls from. So because of this it’s up to the originating switch to verify the caller ID selected by the caller is valid and permissible, something that’s not often implemented. Some SIP providers sell the ability to present any number as your CLI as a “feature”.
There’s heaps of news articles on the topic, but I thought it’d be worth talking about RFC4474 – Designed for cryptographically identifying users that originate SIP requests. While almost never used it’s a cool solution to a problem that didn’t take off.
It does this by adding a new header field, called Identity, for conveying a signature used for validating the identity of the caller, and Identity-Info for a reference to the certificate signing authority.
The calling proxy / UA creates a hash of it’s certificate, and inserts that into the SIP message in the Identity header.
The calling proxy / UA also inserts a “Identity-Info” header containing
The called party can then independently get the certificate, create it’s own hash of it, and if they match, then the identity of the caller has been verified.
Sometimes standards are created that are superior in some scenarios, and just don’t get enough love.
To me Stream Control Transmission Protocol (SCTP) is one of those, and it’s really under-utilised in Voice.
Defined by the SIGTRAN working group in 2000 while working to transport SS7 over IP, SCTP takes all the benefits of TCP, mixes in some of the benefits of UDP (No head of line blocking) and mutihoming support, and you’ve got yourself a humdinger of a Transmission Protocol.
Advantages
Reliable Transmission
Like TCP, SCTP includes a reliable transmission mechanism that ensures packets are delivered and retries if they’re not.
Multi Homing
SCTP’s multi homing allows a single connection to be split across multiple paths. This means if you had two paths between Melbourne and Sydney, you could be sending data down both simultaneously.
This means a loss of one transmission path results in the data being sent down another available transmission path.
If you’re doing this using TCP you’d have to wait for the TCP session to expire, BGP to update and then try again. Not so with SCTP.
No Head of Line Blocking
An error / discard with a packet in a TCP stream requires a re-transmission, blocking anything else in that stream from getting through until the error/discarded packet is sorted out. This is referred to as “head of line blocking” and is generally avoided by switching to UDP but that looses the reliability.
4 Way Handshake
Compared to TCP’s 3-way handshake which is susceptible to SYN flooding.
Deployment
If you’ve got a private network, chances are it can support SCTP.
There’s built in SCTP support in almost all Linux kernels since 2002, Cisco iOS and VxWorks all have support, and there’s 3rd party drivers for OSX and Windows.
SCTP is deployed in 3GPP’s LTE / EPC protocol stack for communication over S1-AP and X2 interfaces, meaning if you’ve got a LTE enabled mobile you’re currently using it, not that you’d see the packets.
You’ll find SCTP in SIGTRAN implementations and some TDM-IP gateways, Media Gateways, protocol converters etc, but it’s not widely deployed outside of this.
SIP Proxies are simple in theory but start to get a bit more complex when implemented.
When a proxy has a response to send back to an endpoint, it can have multiple headers with routing information for how to get that response back to the endpoint that requested it.
So how to know which header to use on a new request?
Routing SIP Requests
Record-Route
If Route header is present (Like Record-Route) the proxy should use the contents of the Record-Route header to route the traffic back.
The Record-Route header is generally not the endpoint itself but another proxy, but that’s not an issue as the next proxy will know how to get to the endpoint, or use this same logic to know how to get it to the next proxy.
Contact
If no Route headers are present, the contact header is used.
The contact provides an address at which a endpoint can be contacted directly, this is used when no Record-Route header present.
From
If there is no Contact or Route headers the proxy should use the From address.
A note about Via
Via headers are only used in getting responses back to a client, and each hop removes it’s own IP on the response before forwarding it onto the next proxy.
This means the client doesn’t know all the Via headers that were on this SIP request, because by the time it gets back to the client they’ve all been removed one by one as it passed through each proxy.
A client can’t send a SIP request using Via’s as it hasn’t been through the proxies for their details to be added, so Via is only used in responding to a request, for example responding with a 404 to an INVITE, but cannot be used on a request itself (For example an INVITE).
If you, like me, spend a lot of time looking at SIP logs, sngrep is an awesome tool for debugging on remote machines. It’s kind of like if VoIP Monitor was ported back to the days of mainframes & minimal remote terminal GUIs.
Installation
It’s in the Repos for Debian and Ubuntu:
apt-get install sngrep
GUI Usage
sngrep can be used to parse packet captures and create packet captures by capturing off an interface, and view them at the same time.
We’ll start by just calling sngrep on a box with some SIP traffic, and waiting to see the dialogs appear.
Here we can see some dialogs, two REGISTERs and 4 INVITEs.
By using the up and down arrow keys we can select a dialog, hitting Enter (Return) will allow us to view that dialog in more detail:
Again we can use the up and down arrow keys to view each of the responses / messages in the dialog.
Hitting Enter again will show you that message in full screen, and hitting Escape will bring you back to the first screen.
From the home screen you can filter with F7, to find the dialog you’re interested in.
Command Line Parameters
One of the best features about sngrep is that you can capture and view at the same time.
As a long time user of TCPdump, I’d been faced with two options, capture the packets, download them, view them and look for what I’m after, or view it live with a pile of chained grep statements and hope to see what I want.
By adding -O filename.pcap to sngrep you can capture to a packet capture and view at the same time.
You can use expression matching to match only specific dialogs.
If you’ve ever phoned a big company like a government agency or an ISP to get something resolved, and been transferred between person to person, having to start again explaining the problem to each of them, then you know how frustrating this can be.
If they stored information about your call that they could bring up later during the call, it’d make your call better.
If the big company, started keeping a record of the call that could be referenced as the call progresses, they’d be storing state for that call.
Let’s build on this a bit more,
You phone Big Company again, the receptionist answers and says “Thank you for calling Big Company, how many I direct your call?”, and you ask to speak to John Smith.
The receptionist puts you through to John Smith, who’s not at his desk and has setup a forward on his phone to send all his calls to reception, so you ring back at reception.
A stateful receptionist would say “Hello again, it seems John Smith isn’t at his desk, would you like me to take a message?”.
A stateless receptionist would say “Thank you for calling Big Company, how many I direct your call?”, and you’d start all over again.
Our stateful receptionist remembered something about our call, they remembered they’d spoken to you, remembered who you were, that you were trying to get to John Smith.
While our stateless receptionist remembered nothing and treated this like a new call.
In SIP, state is simply remembering something about that particular session (series of SIP messages).
SIP State just means bits of information related to the session.
Stateless SIP Proxy
A Stateless SIP proxy doesn’t remember anything about the messages (sessions), no state information is kept. As soon as the proxy forwards the message, it forgets all about it, like our receptionist who just forwards the call and doesn’t remember anything.
Going back to our Big Company example, as you can imagine, this is much more scaleable, you can have a pool of stateless receptionists, none of whom know who you are if you speak to them again, but they’re a lot more efficient because they don’t need to remember any state information, and they can quickly do their thing without looking stuff up or memorising it.
The same is true of a Stateless SIP proxy.
Stateless proxies are commonly used for load balancing, where you want to just forward the traffic to another destination (maybe using the Dispatcher module) and don’t need to remember anything about that session.
It sounds obvious, but because a Stateless SIP proxy it stateless it doesn’t store state, but that also means it doesn’t need to lookup state information or write it back, making it much faster and generally able to handle larger call loads than a stateful equivalent.
Dialog Stateful SIP Proxy
A dialog stateful proxy keeps state information for the duration of that session (dialog).
By dialog we mean for the entire duration on the call/session (called a dialog) from beginning to end, INVITE to BYE.
While this takes more resources, it means we can do some more advanced functions.
For example if we want to charge based on the length of a call/session, we’d need to store state information, like the Call-ID, the start and end time of the call. We can only do this with a stateful proxy, as a stateless proxy wouldn’t know what time the call started.
Also if we wanted to know if a user was on a call or not, a Dialog Stateful proxy knows there’s been a 200 OK, but no Bye yet, so knows if a user is on a call or not, this is useful for presence. We could tie this in with a NOTIFY so other users could know their status.
A Dialog Stateful Proxy is the most resource intensive, as it needs to store state for the duration of the session.
Transaction Stateful SIP Proxy
A transactional proxy keeps state until a final response is received, and then forgets the state information after the final response.
A Transaction Stateful proxy stores state from the initial INVITE until a 200 OK is received. As soon as the session is setup it forgets everything. This means we won’t have any state information when the BYE is eventually received.
While this means we won’t be able to do the same features as the Dialog Stateful Proxy, but you’ll find that most of the time you can get away with just using Transaction Stateful proxies, which are less resource intensive.
For example if we want to send a call to multiple carriers and wait for a successful response before connecting it to the UA, a Transactional proxy would do the trick, with no need to go down the Dialog Stateful path, as we only need to keep state until a session is successfully setup.
For the most part, SIP is focused on setting up sessions, and so is a Transaction Stateful Proxy.
When a final response, like a 200 OK, or a 404, etc, is sent, the receiving party acknowledges that it received this with an ACK.
By provisional responses, such as 180 RINGING, are not acknowledged, this means we have no way of knowing for sure if our UAC received the provisional response.
The issues start to arise when using SIP on Media Gateways or inter-operating with SS7 / ISUP / PSTN, all of which have have guaranteed delivery of a RINGING response, but SIP doesn’t. (Folks from the TDM world will remember ALERTING messages)
The IETF saw there was in some cases, a need to confirm these provisional responses were received, and so should have an ACK.
They created the Reliability of Provisional Responses in the Session Initiation Protocol (SIP) under RFC3262 to address this.
This introduced the Provisional Acknowledgement (PRACK) and added the 100rel extension to Supported / Requires headers where implemented.
This means when 100rel extension is not used a media gateway that generates a 180 RINGING or a 183 SESSION PROGRESS response, sends it down the chain of proxies to our endpoint, but could be lost anywhere along the chain and the media gateway would never know.
When the 100rel extension is used, our media gateway generates a 18x response, and forwards it down the chain of proxies to our endpoint, and our 18x response now also includes a RSeq which is a reliable sequence number.
The endpoint receives this 18x response and sends back a Provisional Acknowledgement or PRACK, with a Rack header (Reliable Acknowledgement) header with the same value as the RSeq of the received 18x response.
The media gateway then sends back a 200 OK for the PRACK.
In the above example we see a SIP call to a media gateway,
The INVITE is sent from the caller to the Media Gateway via the Proxy. The caller has included value “100rel” in the Supported: header, showing support for RFC3262.
The Media gateway looks at the destination and knows it needs to translate this SIP message to a different a different protocol. Our media gateway is translating our SIP INVITE message into it’s Sigtran equivalent (IAM), and forward it on, which it does, sending an IAM (Initial Address Message) via Sigtran.
When the media gateways gets confirmation the remote destination is ringing via Sigtran (ACM ISUP message), it translates that to it’s SIP equivalent message which is, 180 RINGING.
The Media Gateway set a reliable sequence number on this provisional response, contained in the RSeq header.
This response is carried through the proxy back to the caller, who signals back to the media gateway it got the 180 RINGING message by sending a PRACK (Provisional ACK) with the same RSeq number.
The Dispatcher module is used to offer load balancing functionality and intelligent dispatching of SIP messages.
Let’s say you’ve added a second Media Gateway to your network, and you want to send 75% of traffic to the new gateway and 25% to the old gateway, you’d use the load balancing functionality of the Dispatcher module.
Let’s say if the new Media Gateway goes down you want to send 100% of traffic to the original Media Gateway, you’d use the intelligent dispatching to detect status of the Media Gateway and manage failures.
These are all problems the Dispatcher Module is here to help with.
Before we get started….
Your Kamailio instance will need:
Installed and running Kamailio instance
Database configured and tables created (We’ll be using MySQL but any backed is fine)
kamcmd & kamctl working (kamctlrc configured)
Basic Kamailio understanding
The Story
So we’ve got 4 players in this story:
Our User Agent (UA) (Softphone on my PC)
Our Kamailio instance
Media Gateway 1 (mg1)
Media Gateway 2 (mg2)
Our UA will make a call to Kamailio. (Send an INVITE)
Kamailio will keep track of the up/down status of each of the media gateways, and based on rules we define pick one of the Media Gateways to forward the INVITE too.
The Media Gateways will playback “Media Gateway 1” or “Media Gateway 2” depending on which one we end up talking too.
Configuration
Parameters
You’ll need to load the dispatcher module, by adding the below line with the rest of your loadmodules:
loadmodule "dispatcher.so"
Next we’ll need to set the module specific config using modparam for dispatcher:
modparam("dispatcher", "db_url", DBURL) #Use DBURL variable for database parameters
modparam("dispatcher", "ds_ping_interval", 10) #How often to ping destinations to check status
modparam("dispatcher", "ds_ping_method", "OPTIONS") #Send SIP Options ping
modparam("dispatcher", "ds_probing_threshold", 10) #How many failed pings in a row do we need before we consider it down
modparam("dispatcher", "ds_inactive_threshold", 10) #How many sucessful pings in a row do we need before considering it up
modparam("dispatcher", "ds_ping_latency_stats", 1) #Enables stats on latency
modparam("dispatcher", "ds_probing_mode", 1) #Keeps pinging gateways when state is known (to detect change in state)
Most of these are pretty self explanatory but you’ll probably need to tweak these to match your environment.
Destination Setup
Like the permissions module, dispatcher module has groups of destinations.
For this example we’ll be using dispatch group 1, which will be a group containing our Media Gateways, and the SIP URIs are sip:mg1:5060 and sip:mg2:5060
From the shell we’ll use kamctl to add a new dispatcher entry.
You can use kamctl to show you the database entries:
kamctl dispatcher show
A restart to Kamailio will make our changes live.
Destination Status / Control
Checking Status
Next up we’ll check if our gateways are online, we’ll use kamcmd to show the current status of the destinations:
kamcmd dispatcher.list
Here we can see our two media gateways, quick response times to each, and everything looks good.
Take a note of the FLAGS field, it’s currently set to AP which is good, but there’s a few states:
AP – Active Probing – Destination is responding to pings & is up
IP – Inactive Probing – Destination is not responding to pings and is probably unreachable
DX – Destination is disabled (administratively down)
AX – Looks like is up or is coming up, but has yet to satisfy minimum thresholds to be considered up (ds_inactive_threshold)
TX – Looks like or is, down. Has stopped responding to pings but has not yet satisfied down state failed ping count (ds_probing_threshold)
Adding Additional Destinations without Restarting
If we add an extra destination now, we can add it without having to restart Kamailio, by using kamcmd:
kamcmd dispatcher.reload
There’s some sanity checks built into this, if the OS can’t resolve a domain name in dispatcher you’ll get back an error:
Administratively Disable Destinations
You may want to do some work on one of the Media Gateways and want to nicely take it offline, for this we use kamcmd again:
kamcmd dispatcher.set_state dx 1 sip:mg1:5060
Now if we check status we see MG1’s status is DX:
Once we’re done with the maintenance we could force it into the up state by replacing dx with ap.
It’s worth noting that if you restart Kamailio, or reload dispatcher, the state of each destination is reset, and starts again from AX and progresses to AP (Up) or IP (Down) based on if the destination is responding.
Routing using Dispatcher
The magic really comes down to single simple line, ds_select_dst();
The command sets the destination address to an address from the pool of up addresses in dispatcher.
You’d generally give ds_select_dst(); two parameters, the first is the destination set, in our case this is 1, because all our Media Gateway destinations are in set ID 1. The next parameter is is the algorithm used to work out which destination from the pool to use for this request.
Some common entries would be random, round robin, weight based or priority value.
In our example we’ll use a random selection between up destinations in group 1:
if(method=="INVITE"){
ds_select_dst(1, 4); #Get a random up destination from dispatcher
route(RELAY); #Route it
}
Now let’s try and make a call:
UA > Kamailio: SIP: INVITE sip:1111111@Kamailio SIP/2.0
Kamailio > UA: SIP: SIP/2.0 100 trying -- your call is important to us
And bingo, we’re connected to a Media Gateway 1. If I try it again I’ll get MG2, then MG1, then MG2, as we’re using round robin selection.
Destination Selection Algorithm
We talked a little about the different destination select algorithm, let’s dig a little deeper into the common ones, this is taken from the Dispatcher documentation:
“0” – hash over callid
“4” – round-robin (next destination).
“6” – random destination (using rand()).
“8” – select destination sorted by priority attribute value (serial forking ordered by priority).
“9” – use weight based load distribution.
“10” – use call load distribution.
“12” – dispatch to all destination in setid at once
For select destination sorted by priority (8) to work you need to include a priority, you can do this when adding the dispatcher entry or after the fact by editing the data. In the below example if MG1 is up, calls will always go to MG1, if MG1 is down it’ll go to the next highest priority (MG2).
For use weight based load distribution (9) to work, you’ll need to set a weight as well, this is similar to priority but allows you to split load, for example you could put weight=25 on a less powerful or slower destination, and weight=75 for a faster or more powerful destination, so the better destination gets 75% of traffic and the other gets 25%. (You don’t have to do these to add to 100%, I just find it easier to think of them as percentages).
use call load distribution (10) allows you to evenly split the number of calls to each destination. This could be useful if you’ve got say 2 SIP trunks with x channels on each trunk, but only x concurrent calls allowed on each. Like adding a weight you need to set a duid= value with the total number of calls each destination can handle.
dispatch to all destination in setid at once (12) allows you to perform parallel branching of your call to all the destinations in the address group and whichever one answers first will handle the call. This adds a lot of overhead, as for each destination you have in that set will need a new dialog to be managed, but it sure is quick for the user. The other major issue is let’s say I have three carriers configured in dispatcher, and I call a landline.
That landline will receive three calls, which will ring at the same time until the called party answers one of the calls. When they do the other two calls will stop ringing. This can get really messy.
Managing Failure
Let’s say we try and send a call to one of our Media Gateways and it fails, we could forward that failure response to the UA, or, better yet, we could try on another Media Gateway.
Let’s set a priority of 10 to MG1 and a priority of 5 to MG2, and then set MG1 to reject the call.
We’ll also need to add a failure route, so let’s tweak our code:
What is ASN.1 and why is it so hard to find a good explanation or example?
ISO, IEC & ITU-T all got together and wrote a standard for describing data transmitted by telecommunications protocols, it’s used by many well known protocols X.509 (SSL), LDAP, SNMP, LTE which all rely on ASN.1 to encode data, transmit it, and then decode it, reliably and efficiently.
Overview
Let’s take this XML encoded data:
<?xml version="1.0" encoding="UTF-8"?>
<Message>
<number>61412341234</number>
<text>Hello my friend!</text>
</Message>
As you can see it’s human readable and pretty clear.
But what if we split this in two, had the definitions in one file and the values in another:
Definitions:
Here we’ll describe each of our fields
Message: Number – Intiger- Destination of Message Text – String – Message to be sent
Values:
Now we’ll list the values.
61412341234
Hello my friend!
By taking the definitions out our data is now 28 bytes, instead of 122, so we’re a fraction of the size on the wire (becomes important if you’re sending this data all the time or with a limited link budget), and we’ve also defined the type of each value as well, so we know we shouldn’t have an integer as the heading for example, we can see it’s a string.
The sender and the receiver both have a copy of the definitions, so everyone is clear on where we stand in terms of what each field is, and the types of values we encode. As a bonus we’re down to less than 1/4 of our original size. Great!
That’s ASN.1 in a nutshell, but let’s dig a little deeper and use a real example.
ASN.1 IRL
Now let’s actually encode & decode some data.
I’ll be using asn1tools a Python library written by Erik Moqvist, you can add it through pip:
pip install asn1tools
We’ll create a new text file and put our ASN.1 definitions into it, so let’s create a new file called foo.asn which will contain our definitions in ASN.1 format:
HelloWorld DEFINITIONS ::= BEGIN Message ::= SEQUENCE { number INTEGER, text UTF8String } END
Copy and paste that into foo.asn and now we’ve got a definition, with a Module called Message containing a field called number (which is an integer) and a filed called text (which is a string).
Now let’s fire up our python shell in the same directory as our new file:
>>import asn1tools
>>foo = asn1tools.compile_files('foo.asn')
>>encoded = foo.encode('Message', {'number': 61412341234, 'text': u'Hello my friend!'})
>>encoded
bytearray(b'0\x19\x02\x05\x0eLu\xf5\xf2\x0c\x10Hello my friend!')
(You’ll need to run this in the Python shell, else it’ll just output encoded as plain text and not as a Byte Array as above)
So now we’ve encoded our values (number = 2 and text = ‘Hi!’) into bytes, ready to be sent down the wire and decoded at the other end. Not exactly human friendly but efficient and well defined.
So there you have it, an introduction to ASN1, how to encode & decode data.
We can grow our definitions (in the .asn file) and so long as both ends have the same definitions and you’re encoding the right stuff, you’ll be set.
Further Reading
There’s a whole lot more to ASN1 – Like how you encode the data, how to properly setup your definitions etc, but hopefully you understand what it actually does now.
I recently began integrating IMS Authentication functions into PyHSS, and thought I’d share my notes / research into the authentication used by IMS networks & served by a IMS capable HSS.
There’s very little useful info online on AKAv1-MD5 algorithm, but it’s actually fairly simple to understand.
Authentication and Key Agreement (AKA) is a method for authentication and key distribution in a EUTRAN network. AKA is challenge-response based using symmetric cryptography. AKA runs on the ISIM function of a USIM card.
The Nonce field is the Base64 encoded version of the RAND value and concatenated with the AUTN token from our AKA response. (Often called the Authentication Vectors).
That’s it!
It’s put in the SIP 401 response by the S-CSCF and sent to the UE. (Note, the Cyperhing Key & Integrity Keys are removed by the P-CSCF and used for IPsec SA establishment.
There’s often a lot of focus on the signalling side of VoIP, but the media RTP (Real Time Protocol) is the protocol that actually transfers the voice over IP.
RTP is designed to be bare-bones and adaptable. A RTP packet doesn’t have pretty RFC822 style headers that are easy to read, but rather a fixed length formatted string of Hex values, with different positions denoting different values to keep the size down. There’s no checksum in the protocol, error correction, or anything else that might add overhead.
RTP is the transport of the media, it contains the media as a payload inside, but it’s up to the system creating the RTP packets as to what’s inside the payload. The header of an RTP packet does denote the payload type, but RTP has no way to verify that the contents of the payload match the payload type specified.
First defined in 1996, RTP hasn’t seen much evolution, primarily owing to it’s design being as lightweight and simple as possible. RTP had a bit of an update in 2003 under RFC 3550, but that only touched upon changes to the timer algorithm. (That deserves a post of it’s own) There have been pushes in the past for a further cut down RTP with fewer fields, as being fixed-width some fields when not used are just padded with 00000s so the packet size on the wire remains the same regardless.
RTP is generally carried over UDP but it will run over TCP. Running your RTP traffic over TCP can be pretty costly due time-sensitive nature and the sheer volume of packets you’ll be seeing. If if you’re packetizing a G. 711 a-law call (sampled every 20 ms at 8,000 Hz) that’s a packet every 160ms – 375 packets each direction per minute on UDP. If you were to use TCP to transport these packets you’d need to add the 3-way-handshake giving you 3 times as many packets at 1125 packets per minute, not to mention much more jitter and PDV caused by 3 times the load.
Header Fields
The data in RTP headers is in Hexidecimal format, which keeps it’s size down and processing minimal, but also means it’s pretty rigidly defined in terms of spacing etc, it’s not like a SIP header which might look like To: [email protected]\n\r, this wastes precious space on the wire to add the “To: ” and the “\n\r”, so instead it’s fixed positioning all the way with just the data.
If you haven’t had the joys of working with 90’s data files in fixed width formats, the premise is fairly simple; each value has a start and end position within a document. More info on creating RTP headers can be found in the post “Crafting RTP Packets”.
Generally when working with RTP packets on the wire, all these headers are joined one after another, broken up into blocks of 8 (octets) and then converted to HEX, all to ensure it’s as small as practical when it’s transmitted.
Version (2 bits)
RTP has had two published versions, but in both the value to put in this field is 10 (Binary 2). If you are reading this on a machine that isn’t running DOS, there’s a good chance you’ll only see version 2. If your traffic routed through a wormhole, or your network has some serious latency issues (Several decades) you could find yourself working with a media stream pre-1996 (hopefully not) using the draft version of RTP and has a value of 01 (Binary 1). But if you were dealing with RTP’s predecessor; vat, this value would be 0. (vat and RTP aren’t the same).
Padding (1 bit)
If you’re encrypting your packets you may need them to be a specific size, and for this you may need to pad the packets out at the end. To do this you’d enable padding by putting a 1 here and then specifying at the end of the payload how many octets of padding you need. In most cases this isn’t used though, and this value will be 0.
Extension (1 bit)
Unlike a lot of RFC documents that specify “must” “shall” etc, RTP was defined more as a guideline, a template for implementers. The extension field was added to allow individual implementations with additional custom data in the headers, while being ignored by other network elements that don’t support the extensions. If this is enabled it’s followed by 16 bits of you-decide. However like the padding value, this is likely to be 0.
Contributing Source (CSRC) Count (4 bits)
RTP allows you to have multiple Contributing Sources. This means on a 3-way call, instead of your switch taking the two audio streams, joining them together (mux) and sending each endpoint a single media stream, you could have direct-media from one of the parties you’re on a 3 way call with, and the other party you’re on a call with added as a Contributing Source. Again, it’s likely this is 0000.
Marker (1 bit)
If the marker bit is set or not is actually up to the underlying protocol. In video the marker bit is often used to signify the image has significantly changed, and in audio it’s generally to denote the end of silence & the start of talking – called a “talkspurt”.
Payload Type (7 bits)
The payload type is what specifies the contents of the payload. In voice terms this means the codec we’re using. RFC3551 defines some predefined payload type definitions and it’s Payload Type code.
Your values might not appear in the RFC3551 definitions if you’re using a non-standard codec, and that’s Ok. RTP could be used to play video games or pilot an RC plane, it’s really just a protocol to carry a stream of real time data quickly from point A to point B with as little overhead as possible.
PCMA / PCMU is king here thanks to it G.711’s widespread adoption due to being the codec used in TDM, and the fact you don’t need to transcode PCM to bring the traffic into the network, or compress it from a TDM source. TDM / circuit switched services are way less common on the network edge these days, but G.711 still holds on as the defacto standard.
So for a G711 a-law (PCMA) payload this value would be 8, which is 0001000 in Binary (it’s also equivalent to 1000 in binary but we need to fill all 7 bits because we’re using fixed-width formatting, so we prefix it with zeros, if we were using GSM, which is 8 in decimal and 11 in binary, we’d format is 0000011)
The sequence number is a supposedly random number that increments by 1 for each packet sent. This allows the receiving party to calculate packet loss, because if you receive packets with the sequence number 1,2,3,5,6 you know you’ve missed packet 4. It also allows us to calculate our packet delay variation (PDV), and helps our jitterbuffer re-assemble packets, if we receive packets 1,3,2,4,5,6 we can see they’re out of sequence and know to play them back in the order 1,2,3,4,5,6, not the order we received them.
The sequence numbers are supposed to be random. By having this as a random number it adds an extra unknown part of the packet for someone trying to break any crypto on top to guess. Polycom however just start all theirs at 0. (Slow clap)
This is a 16 bit number, so like the payload type we’ll have to convert it from decimal to binary, then pad it to be 16 bits. So if our starting sequence number is 1234 we’d have to convert it to binary (10011010010) and then pad it to 16 characters (0000010011010010)
Timestamp (32 bits)
The timestamp, like the sequence number is supposed to begin with a random number, and then increased by the sampling instances between packets “monotonically and linearly in time”. In essence it means a random starting number + time between packets.
The value increments by the packetization time (ptime) in seconds x bandwidth in Hz. So for a call with a ptime of 20ms at 8Khz this would be:
0.020 x 8000 = 160, so increment by 160 each packet.
Having an accurate source for the timestamp allows accurate stats to be generated for PDV, jitter etc, without this value being accurate your RTCP values will always appear off, even if the audio is fine.
If we wanted a timestamp of 837026880, we’d need to convert it to Binary (110001111001000000010001000000) and pad it to ensure it’s 32 bits long (this value already is so no need to pad).
The SSRC is like a Call-ID, a unique value that identifies one RTP stream from another. If you had two packets to the same port, from the same IP, with roughly the same sequence number & timestamp you’d need a way to determine which RTP stream is for which session. This is where the SSRC comes in, a unique identifier that identifies one RTP stream from the others.
To keep it random the spec’ even suggests ways to generate a random value based on other values.
If we wanted to use 185755418 as or SSRC we’d need to convert it to Binary (1011000100100110011100011010) and pad it to 32 bits (00001011000100100110011100011010)
CSRC List (Contributing source list) (0 to 15 individual 32 bit values)
This contains a list of the contributing sources. Depending on how many sources were specified in your CSRC Count, this can have any number of items from 0 to 15. So if you had one contributing source in the CSRC Count, then you’d have 1x 32 bit value to specify the details of the SSRC identifiers of the sources.
This will always not exist if the CSRC Count is 0.
Payload
The payload is whatever you want it to be, so long as you specify it in the Payload Type field, and pad it if enabled, any data can be put here.
Further Reading
I got a copy of the Colin Perkin’s book RTP: Audio and Video for the Internet, which covers everything you need to ever know about RTP. I’d highly recommend it. It was written in 2003, is still just as relevant as more and more traffic moves off circuit switched into packet switched.
If we want to send a SIP message to Bob’s phone, we needs to know the IP Address of Bob’s phone. There are 4,294,967,296 IPv4 addresses, so finding Bob may take a while.
Bob could let us know his IP address, but what if Bob’s IP changes? If he’s using a Softphone while he’s out to lunch and a desk phone once he gets back to the office. How do we find Bob?
SIP manages this using a SIP Registrar, essentially, when Bob goes out to lunch and starts his softphone app, the softphone checks in with the Registrar and lets the Registrar know what IP to find Bob on now (the softphone’s IP).
When he gets back to the office he closes the softphone app, as it shuts down it checks in with the Registrar again to let it know Bob’s not using the softphone any more.
So our Registrar keeps track of the IP address you can find a SIP endpoint on.
It does this using an Address on Record (AoR). It’s a record of a contact – Like Bob, and the IP Addresses to contact Bob, kind of like DNS is a record of the domain name and the IP it translates to.
A simplified example AoR might look like:
Bob | 192.168.1.2 | Expires 1800
So if we want to send a SIP message to Bob, we look up Bob’s IP in our Address on Record list, and send it to that IP.
A Registrar takes the info received in a SIP REGISTER messages and stores the IP Address and contact info in the form of an Address on Record (AoR).
Registrars also manage expiry, if Bob’s softphone sends a SIP REGISTER message letting us know we’re on one IP address, and then his phone runs out of battery or drops out of coverage, we don’t want to keep sending SIP messages that are going to be lost, so in this case Bob has 1800 seconds left, after which his Address on Record will be discarded if he doesn’t send another REGISTER message before then.
Different SIP Registrars use different ways to store this information, and some store more info, like User-Agent, NAT information and multiple contact IP addresses. Most implementations of a SIP Registrar use some form of database back end or another to store this information. In my Kamailio Registrar example we store it in memory, but you could store it in some form of SQL database, text files, post it notes or punch card, so long a you have a quick way to look it up when needed.
So that’s a SIP Registrar in a nutshell, we’ll talk more about the REGISTER process and flow, including what the www-auth header does, the Contact header and multi endpoint registration in future posts.
I can see it’s registered, but when I call it it’s not ringing, what’s wrong?
Support team
It’s a question I get every so often, and it generally comes down to a misunderstanding in the way the SIP Register mechanism works.
When a UA registers to a SIP server it includes an “Expires:” header, which means it’s registration will expire after that time.
It doesn’t mean it’ll be active that whole time, just that for the time specified it intends to be at that address, but life, and networks, often have other plans.
Let’s jump out of SIP for a minute and imagine you’re going to give me a package, I leave you a note saying:
I’ll be waiting outside the station in a trench-coat under the lamp post between 7:00 and 7:15
You get there at 7:12 but you can’t deliver the package. I’m nowhere to be seen.
The note I left says I’ll be there during that time, but I’ve disappeared, and no you can’t hand the package to me.
Just because you have a note saying I’ll be there, doesn’t mean I still will be. It was my intention to be there, but I’m obviously not.
The SIP register is the same as the note left on the desk. I intended to be there, but I’m now obviously not, and I haven’t had a way to reach you to let you know this has changed, or I myself don’t know.
You see this in SIP from time to time, generally it’s due to the connection the UA is coming from dropping or it’s public IP changing.
For example, a REGISTER is sent with an Expires of 3600 seconds (An hour) to a SIP switch from IP address 1.2.3.4.
Half an hour later your connection drops.
As far as the SIP switch is concerned it’s going to send any incoming messages to 1.2.3.4, as 1.2.3.4 said it’d be there for the next hour.
So even though the connection is dropped to 1.2.3.4 the SIP Switch has no way of knowing this and continues to forward any traffic for that user to 1.2.3.4 until the 3600 seconds is up since it last tried to REGISTER.
Same thing could happen if our UA is behind a NAT and the external IP changes or the connection is changed. The UA doesn’t know anything has changed, so no REGISTER is sent to refresh, and messages from the SIP server are sent to the old address.
A lot of SIP switching platforms allow you to view register status, but just keep in mind it doesn’t mean the device is still answerable at that address, only that it intended to be.
SIP was designed to be flexible in it’s operation, and for, where possible, messages to take the most direct path.
For example I can use a Registrar function of a proxy to find the IP of a registered endpoint, but once a dialog is setup, why should the proxy be involved? The endpoint & I can take it from here, and can talk directly to each other using the address in the Contact header.
This works really well in some scenarios, as I described above you can have the registrar proxy setup the introduction and then off you go.
Other scenarios this doesn’t work quite so well, for example if the call needs to be billed. To charge correctly, the proxy needs to know when the call ends to know when to stop charging.
If the endpoint we’re talking to is behind a NAT, the NAT might just be locked to the IP of the registrar proxy and drop your traffic.
The Record-Route header exists to address this.
If a proxy adds a Record-Route header, it means it’ll sit in the middle of any future requests in this dialog, and route them back through the proxy.
By adding a Record-Route header on the proxy for our billing example, our proxy will forward inline all the messages between the two end points for that dialog, including the BYE so the proxy knows when to stop charging.
For the NAT scenario we described the Proxy will add a Record-Route header and forward all the messages between the two endpoints, so NAT won’t be an issue as the source IP of the packets will be the same as the proxy.
There was a bit of confusion in regards to implementation so to address this IETF wrote RFC 5658 to address Record-Route Issues in SIP.
Content-Type application/sdp is something you’ll see a whole lot when using SIP for Voice over IP, especially in INVITEs and 200 OK responses.
This is because SIP uses SDP to negotiate the media setup.
While Voice over IP uses RTP for media, and SIP for signalling, the meat in this sandwich is SDP, used to negotiate the RTP parameters and payloads before going ahead.
Without SDP you’d just have random unidentified RTP streams going everywhere and no easy way to correlate them back to a Session (SIP) or guarantee both endpoints support the same codec (RTP payload).
Enter SDP, the Session Description Protocol, before any RTP is sent, SDP advertises capabilities (which codecs to use), contact information, port information (which port to send the RTP stream to) and attempts to negotiate a media session both endpoints can support.
SDP is designed to be lightweight, while SIP uses human readable headers like To and From, SDP does away with this in favour of single letters representing what that header contains.
As an interesting aside, SIP at one stage also offered one-letter headers to make it smaller on the wire, but this never really took off.
Here we can see what an SDP header looks like, showing the Session ID, Session Name, Connection Information and Media Descriptions.
Let’s dig a little deeper and have a look at what this SDP header actually shows that’s useful to us.
The SDP Offer
Session Identifiers
The Owner / Creator & Session ID header (abbreviated to o=) contains the SDP session ID and the session owner / creator information. This contains the SDP Session ID and the IP Address / FQDN of the owner or creator of this session. In this case the SDP Session ID is 777830 and the Session owner / creator is 195.135.145.201.
Connection Information
Next up we’ve got the connection information header (abbreviated to c=) which contains the IP Address we want the incoming RTP stream sent to. In this example it’s coming IN on IPv4 address 192.135.154.201.
The Media Description header (m=) also contains the port we want to receive the audio on, 15246.
So in summary we’re telling the called party that we’ll be listening on IP Address 192.135.154.201 on port 15246, so they should send their RTP audio stream to that address & port.
Media Attributes
The Media Description header (abbreviated to m=) contains a name and address, in this case it’s audio, and sent to address (port) 15246.
After that we’ve got the RTP Audio / Video profile numbers. Because SDP is designed to be lightweight instead of saying PCMA, PCMU here each codec is assigned a number by IANA that translates to a codec. The full list is here, but 8 is equal to PCMA and 0 is equal to PCMU.
So from the Media Description header we can learn that it’s an Audio session, with media to be sent to port 15246, via RTP using PCMA or PCMU.
Different codecs can have different bitrates, so by using the Media Attribute header (Abbreviated to a=) we can set the bitrates for each. In this case both PCMA and PCMU are using a bitrate of 8000.
Summary
So to summarise we’ve told the party we’re calling our session ID is 777830 and it’s owned / created by 195.135.145.201. We support PCMA and PCMU at 8000Hz, and we’ll be listening on IPv4 on 195.135.145.201 on port 15246 for them to send their audio stream to.
The SDP Answer
Next we’ll take a look at the SDP from a 200 OK response, and work out what our session will look like.
Codec Selection
We can see this device only supports PCMA, which makes codec selection pretty easy, it’s going to be PCMA as that was also supported in the SDP offer contained in the initial INVITE.
In the scenario where both devices support the same codecs, the order in which the codecs are listed defines what codec is selected.
Connection Information
Like in the SDP offer we can see that we’re requesting incoming RTP / media to be sent to, in this case we’re asking for the RTP / media on 195.135.145.195 port 25328
Final Steps
Generally after the 200 OK is received an ACK is sent and media starts flowing in both directions between endpoints.
In this example 195.135.145.195 will send their audio (aka media / RTP) to 195.135.145.201 on port 15246 (called party to the caller) and 195.135.145.201 will send their audio to 195.135.145.195 on port 25328 (calling party to the called party).
It’s always worth keeping in mind that SIP doesn’t have to be used for Voice, nor does it have to use SDP, nor does SDP have to be used with SIP, it can be used with other protocols (IAX, H.323), and doesn’t have to negotiate RTP sessions, but could negotiate anything.
That said, the SIP – SDP – RTP sandwich is pretty ubiquitous for good reason, and while it’s true that none of these protocols require each other, the truth is, most of their usage is with one-another and it’s easier to just say “SIP uses SDP” and “SDP uses RTP” than continually saying “SIP can use SDP” and “SDP can use RTP” etc.
Want more telecom goodness?
I have a good old fashioned RSS feed you can subscribe to.