I had a discussion with a friend the other day about if hold is signified with a=sendonly or a=recvonly, which led me to revisiting the RFC to confirm, so here’s an overview of how “Call Hold” works in SIP.
By the Book
According to RFC 6337 a user can hold calls by sending a new SDP offer in an established session (Re-INVITE on active call), with an SDP payload of a=sendonly for each media stream the user want’s to hold.
The SIP Switch / PBX / UAS replies with an updated SDP where each media stream’s SDP contains a=recvonly.
So it’s both, depending on which leg you’re looking at.
In Common Practice
When a UAC puts a call in hold, it does so by sending a SIP re-INVITE, updating the SDP to include the attribute line “sendonly”
See the bottom line of the SDP is a=sendonly ? That’s denoting the call is to be put on hold,
If the call hold was sucesful the UAS sends back a 200 Ok, with the SDP attribute set to recvonly
The a=recvonly denotes the call has been held.
To retrieve the call another SIP re-invite is sent by the UAC, this time setting the media attribute back to sendrecv
If sucesful a 200 OK is sent by the UAS with the a=sendrecv also set.
On top of plain vanilla RFC3261, there’s a series of “Extension” methods added to SIP to expand it’s functionality, common extension methods are INFO, MESSAGE, NOTIFY, PRACK and UPDATE. Although now commonplace, of these is not defined in RFC3261 so is considered an “extension” to SIP.
It’s worth just pausing here to reiterate we’re not talking extensions like in a PBX context, like extra phones, we’re talking extensions like you’d add to a house, like extra functionality.
A SIP client can request functionality from a server (UAC to a UAS), if the server does not have support for that functionality, it can reject the session on those grounds and send back a response indicating it doesn’t know how to handle that extension, like a 420 Bad Extension – Bad SIP Protocol Extension used, not understood by the server. Response.
So clients can determine what functionality a server doesn’t support if it rejects the request, but there was no way to see what functionality the server does support, and what functionality the client requires.
If a UAC or UAS requires support for an extension – For example a Media Gateway has to understand PRACK, it can use the Require header to specify the request should be rejected if support for the listed extensions is not provided.
These headers are most commonly seen in SIP OPTIONS requests.
SIP Proxies are simple in theory but start to get a bit more complex when implemented.
When a proxy has a response to send back to an endpoint, it can have multiple headers with routing information for how to get that response back to the endpoint that requested it.
So how to know which header to use on a new request?
Routing SIP Requests
Record-Route
If Route header is present (Like Record-Route) the proxy should use the contents of the Record-Route header to route the traffic back.
The Record-Route header is generally not the endpoint itself but another proxy, but that’s not an issue as the next proxy will know how to get to the endpoint, or use this same logic to know how to get it to the next proxy.
Contact
If no Route headers are present, the contact header is used.
The contact provides an address at which a endpoint can be contacted directly, this is used when no Record-Route header present.
From
If there is no Contact or Route headers the proxy should use the From address.
A note about Via
Via headers are only used in getting responses back to a client, and each hop removes it’s own IP on the response before forwarding it onto the next proxy.
This means the client doesn’t know all the Via headers that were on this SIP request, because by the time it gets back to the client they’ve all been removed one by one as it passed through each proxy.
A client can’t send a SIP request using Via’s as it hasn’t been through the proxies for their details to be added, so Via is only used in responding to a request, for example responding with a 404 to an INVITE, but cannot be used on a request itself (For example an INVITE).
If we want to send a SIP message to Bob’s phone, we needs to know the IP Address of Bob’s phone. There are 4,294,967,296 IPv4 addresses, so finding Bob may take a while.
Bob could let us know his IP address, but what if Bob’s IP changes? If he’s using a Softphone while he’s out to lunch and a desk phone once he gets back to the office. How do we find Bob?
SIP manages this using a SIP Registrar, essentially, when Bob goes out to lunch and starts his softphone app, the softphone checks in with the Registrar and lets the Registrar know what IP to find Bob on now (the softphone’s IP).
When he gets back to the office he closes the softphone app, as it shuts down it checks in with the Registrar again to let it know Bob’s not using the softphone any more.
So our Registrar keeps track of the IP address you can find a SIP endpoint on.
It does this using an Address on Record (AoR). It’s a record of a contact – Like Bob, and the IP Addresses to contact Bob, kind of like DNS is a record of the domain name and the IP it translates to.
A simplified example AoR might look like:
Bob | 192.168.1.2 | Expires 1800
So if we want to send a SIP message to Bob, we look up Bob’s IP in our Address on Record list, and send it to that IP.
A Registrar takes the info received in a SIP REGISTER messages and stores the IP Address and contact info in the form of an Address on Record (AoR).
Registrars also manage expiry, if Bob’s softphone sends a SIP REGISTER message letting us know we’re on one IP address, and then his phone runs out of battery or drops out of coverage, we don’t want to keep sending SIP messages that are going to be lost, so in this case Bob has 1800 seconds left, after which his Address on Record will be discarded if he doesn’t send another REGISTER message before then.
Different SIP Registrars use different ways to store this information, and some store more info, like User-Agent, NAT information and multiple contact IP addresses. Most implementations of a SIP Registrar use some form of database back end or another to store this information. In my Kamailio Registrar example we store it in memory, but you could store it in some form of SQL database, text files, post it notes or punch card, so long a you have a quick way to look it up when needed.
So that’s a SIP Registrar in a nutshell, we’ll talk more about the REGISTER process and flow, including what the www-auth header does, the Contact header and multi endpoint registration in future posts.
SIP was designed to be flexible in it’s operation, and for, where possible, messages to take the most direct path.
For example I can use a Registrar function of a proxy to find the IP of a registered endpoint, but once a dialog is setup, why should the proxy be involved? The endpoint & I can take it from here, and can talk directly to each other using the address in the Contact header.
This works really well in some scenarios, as I described above you can have the registrar proxy setup the introduction and then off you go.
Other scenarios this doesn’t work quite so well, for example if the call needs to be billed. To charge correctly, the proxy needs to know when the call ends to know when to stop charging.
If the endpoint we’re talking to is behind a NAT, the NAT might just be locked to the IP of the registrar proxy and drop your traffic.
The Record-Route header exists to address this.
If a proxy adds a Record-Route header, it means it’ll sit in the middle of any future requests in this dialog, and route them back through the proxy.
By adding a Record-Route header on the proxy for our billing example, our proxy will forward inline all the messages between the two end points for that dialog, including the BYE so the proxy knows when to stop charging.
For the NAT scenario we described the Proxy will add a Record-Route header and forward all the messages between the two endpoints, so NAT won’t be an issue as the source IP of the packets will be the same as the proxy.
There was a bit of confusion in regards to implementation so to address this IETF wrote RFC 5658 to address Record-Route Issues in SIP.
Branch IDs were introduced in RFC 3261, to help keep differentiate all the different transactions a device or proxy might be involved in.
The answer isn’t that exciting. IETF picked the 7 character long prefix as a magic cookie so older SIP servers (RFC 2543 compliant only) wouldn’t pick up the value due to it’s length.
The branch ID inserted by an element compliant with this specification MUST always begin with the characters “z9hG4bK”. These 7 characters are used as a magic cookie (7 is deemed sufficient to ensure that an older RFC 2543 implementation would not pick such a value), so that servers receiving the request can determine that the branch ID was constructed in the fashion described by this specification (that is, globally unique).
As to why z9hG4bK, instead of any other random 7 letter string, I haven’t been able to find an answer, but it’s as good as any random 7 letter string I guess.
If we were to take the password password and hash it using an online tool to generate MD5 Hashes we’d get “5f4dcc3b5aa765d61d8327deb882cf99”
If we hash password again with MD5 we’d get the same output – “5f4dcc3b5aa765d61d8327deb882cf99”,
The catch with this is if you put “5f4dcc3b5aa765d61d8327deb882cf99” into a search engine, Google immediately tells you it’s plain text value. That’s because the MD5 of password is always 5f4dcc3b5aa765d61d8327deb882cf99, hashing the same input phase “password” always results in the same output MD5 hash aka “response”.
By using Message Digest Authentication we introduce a “nonce” value and mix it (“salt”) with the SIP realm, username, password and request URI, to ensure that the response is different every time.
Let’s look at this example REGISTER flow:
We can see a REGISTER message has been sent by Bob to the SIP Server.
REGISTER sips:ss2.biloxi.example.com SIP/2.0 Via: SIP/2.0/TLS client.biloxi.example.com:5061;branch=z9hG4bKnashds7 Max-Forwards: 70 From: Bob <sips:[email protected]>;tag=a73kszlfl To: Bob <sips:[email protected]> Call-ID: [email protected] CSeq: 1 REGISTER Contact: <sips:[email protected]> Content-Length: 0
The SIP Server has sent back a 401 Unauthorised message, but includes the WWW-Authenticate header field, from this, we can grab a Realm value, and a Nonce, which we’ll use to generate our response that we’ll send back.
SIP/2.0 401 Unauthorized Via: SIP/2.0/TLS client.biloxi.example.com:5061;branch=z9hG4bKnashds7 ;received=192.0.2.201 From: Bob <sips:[email protected]>;tag=a73kszlfl To: Bob <sips:[email protected]>;tag=1410948204 Call-ID: [email protected] CSeq: 1 REGISTER WWW-Authenticate: Digest realm="atlanta.example.com", qop="auth",nonce="ea9c8e88df84f1cec4341ae6cbe5a359", opaque="", stale=FALSE, algorithm=MD5 Content-Length: 0
The formula for generating the response looks rather complex but really isn’t that bad.
Let’s say in this case Bob’s password is “bobspassword”, let’s generate a response back to the server.
We know the username which is bob, the realm which is atlanta.example.com, digest URI is sips:biloxi.example.com, method is REGISTER and the password which is bobspassword. This seems like a lot to go through but all of these values, with the exception of the password, we just get from the 401 headers above.
So let’s generate the first part called HA1 using the formula HA1=MD5(username:realm:password), so let’s substitute this with our real values: HA1 = MD5(bob:atlanta.example.com:bobspassword) So if we drop bob:atlanta.example.com:bobspassword into our MD5 hasher and we get our HA1 hash and it it looks like 2da91700e1ef4f38df91500c8729d35f, so HA1 = 2da91700e1ef4f38df91500c8729d35f
Now onto the second part, we know the Method is REGISTER, and our digestURI is sips:biloxi.example.com HA2=MD5(method:digestURI) HA2=MD5(REGISTER:sips:biloxi.example.com) Again, drop REGISTER:sips:biloxi.example.com into our MD5 hasher, and grab the output – 8f2d44a2696b3b3ed781d2f44375b3df This means HA2 = 8f2d44a2696b3b3ed781d2f44375b3df
Finally we join HA1, the nonce and HA2 in one string and hash it: Response = MD5(2da91700e1ef4f38df91500c8729d35f:ea9c8e88df84f1cec4341ae6cbe5a359:8f2d44a2696b3b3ed781d2f44375b3df)
Which gives us our final response of “bc2f51f99c2add3e9dfce04d43df0c6a”, so let’s see what happens when Bob sends this to the SIP Server.