For example, an American visiting the UK, would have 911 on the Emergency Calling Codes list on their SIM card, but in the UK they dial 999 to reach emergency services.
There’s two angles to this, the first is if a roamer dials the emergency calling code of their home country, the other is if they dial the emergency calling code of the country they are in.
Let’s look at the first scenario, where the roamer dials the emergency calling code of their home country.
If our American in the UK abroad dials 911, that number is on the ECC list on the SIM, it’s still flagged as an emergency call, and just goes out with the standard urn:service:sos URN – The network never sees 911 or 999, just that it’s an SOS call that goes to the PSAP.
In this scenario, the fact the dialled number is not passed to the network is actually a positive, we get the intent that the user wants to reach emergency services, and route based on this.
But what if our American friend in need dials 999? That’s the correct number for the end user to dial in the UK after all, but if that’s not in their ECC list on the SIM / device, it’d go through as a regular call right?
If the call does not get flagged as an emergency call on the UE this has its own set of complications and considerations:
S8-Home Routing for VoLTE means that as the UE doesn’t know this is an emergency call, the call will get routed back to the home network. This means the call doesn’t go to the E-CSCF in the visited network, and would probably just get a message saying the number they’ve dialed is unavailable, this would be exactly as if they dialed 999 at home in the US.
But we have a fix for this! On each MME we can set a list of emergency numbers, which would allow our Britt’s phone to know on this network, what the emergency calling codes are, and route the 999 call to the local PSAP, rather than home routing it.
This information is jammed into the Emergency Number List IE in the NAS Attach Accept body.
This means our American visitor in the UK, would know about 999 from the ECC list configured in the roaming operator’s MME.
The purpose of this information element is to encode emergency number(s) for use within the country where the IE is received.
3GPP TS 24.008: 10.5.3.13 – Emergency Number List
Where this becomes more problematic is unauthenticated emergency calling.
For example, a our American visiting the UK, that is not roaming dials 999.
We’ll assume the UK and US operator don’t have a VoLTE roaming agreement because they’ve been kicking the can down the road when it comes to VoLTE roaming… This is super common scenario – last numbers I saw on this were last year with ~50 bilateral VoLTE agreements in place worldwide.
Because the phone is not attached to a local MME, the handset does not know that 999 is an emergency calling code (because it’s not on the SIM), after all, the only way it can get the Emergency Number List is from an MME, and not having been attached to an MME, means the phone does not have the ECC list for the country, so the the handset does not begin the emergency attach procedure to make the call.
Common sense prevails here, on the majority of phones and the majority of SIM profiles, codes like 112 or 911 are treated as emergency calls, but more obscure numbers, such as dialing 999 in the UK or 10111 for South African Police on a handset with US firmware, are not guaranteed to work. Generally dialing the Emergency Calling code in the home network would get you through to some emergency services (although as we talked about in the last post, this might get you routed to the wrong agency in countries where each agency has their own number).
A better way forward?
These days I don’t dial much (apart from if I’m making adjustments on the Step-by-Step exchange), when I call people I do it from contacts, hyperlinks, etc.
There is mountains of research to suggest that asking people to remember codes and phone numbers, is a struggle. A tourist who finds themselves in Tunisia in need of assistance, is unlikely to remember that it’s 190 for an Ambulance, and 198 for Fire.
Perhaps the ECC list on a phone should populate a page of icons from the emergency page on the phone, with the universal icon for each agency, that sends to the URN for that service type?
Countries with a single PSAP could have the URNs for each service type routed to the same place, while countries with seperated PSAPs for each service type, can route accordingly.
Likewise if a country does have a centralised PSAP for all call types, knowing the type that is selected would be useful, for example if the user has pressed fire and is not responsive when the call is answered, the best unit to dispatch would probably be a fire engine.
A lot of countries have a single point of contact for emergency services; in Europe you’d call 112 in an emergency, 000 in Australia or 911 in the US. Calling this number in the country will get you the emergency services.
This means a caller can order an ambulance for smoke inhalation, and the fire brigade, in one call.
But that’s not the case in every country; many countries don’t have one number for theemergency services, they’ve got multiple; a phone number for police, a different number for fire brigade and a different number for an ambulance.
For example, in Brazil if you need the police, you call 190, while a for example, uses 193 as the emergency number for the fire department, the police can be reached at 190 or 191 depending on if it’s road policing or general, and medical emergencies are covered by 192. Other countries have similar setups.
This is all well and good if you’re in Brazil, and you call 192 for an ambulance, the phone sends a SIP INVITE with a Request URI of sip:[email protected], because we can put a rule into our E-CSCF to say if the number is 192 to route it to the answer point for ambulances – But that’s not often the case on emergency calls.
In IMS, handsets generally detect the number dialed is on the Emergency Calling Code (ECC) list from the USIM Card.
The use of the ECC list means the phone knows this is an emergency call, and this is really important. For countries that use AML this can trigger sending of the AML SMS that process, and Emergency Calls should always be allowed to be made, even without credit, a valid SIM card, or even a SIM in the phone at all.
But this comes with a cost; when a user dials 911, the phones doesn’t (generally) send a call to sip:[email protected] like it would with any other dialled number, but rather the SIP INVITE is sent to urn:service:sos which will be routed to the PSAP by the E-CSCF. When a call comes through to these URNs they’re given top priority in the network
This is all well and good in a country where it doesn’t matter which emergency service you called, because all emergency calls route to a single PSAP, but in a country with multiple numbers, it’s really important when you call and ambulance, your call doesn’t get routed to animal control.
That means the phone has to look at what emergency number you’ve dialed, and map the URN it sends the call to to match what you’ve actually requested.
Recently we’ve been helping an operator in a country with a numbering plan like this, and we’ve been finding the limits of the standards here. So let’s start by looking at what the standards state:
IMS Emergency Calling is governed by TS 103.479 which in turn delegates to IETF RFC 5031, but for the calling number to URN translation, it’s pretty quiet.
Let’s look at what RFC 5031 allows for URNs:
urn:service:sos.ambulance
urn:service:sos.animal-control
urn:service:sos.fire
urn:service:sos.gas
urn:service:sos.marine
urn:service:sos.mountain
urn:service:sos.physician
urn:service:sos.poison
urn:service:sos.police
The USIM’s Emergency Calling Codes EF would be the perfect source of this data; for each emergency calling code defined, you’ve got a flag to indicate what it’s for, here’s what we’ve got available on the SIM Card:
Bit 1 Police
Bit 2 Ambulance
Bit 3 Fire Brigade
Bit 4 Marine Guard
Bit 5 Mountain Rescue
Bit 6 manually initiated eCall
Bit 7 automatically initiated eCall
Bit 8 is spare and set to “0”
So these could be mapped pretty easily you’d think, so if the call is made to an Emergency Calling Code flagged with Bit 4, the URN would go to urn:service:sos.mountain.
Alas from our research, we’ve found most OEMs send calls to the generic urn:service:sos, regardless of the dialled number and the ECC flags that are set on the SIM for that number.
One of the big chip vendors sends calls to an ECC flagged as Ambulance to urn:service:sos.fire, which is totally infuriating, and we’ve had to put a rule in our E-CSCF to handle this if the User Agent is set to one of their phones.
Is there room for improvement here? For sure! Emergency calling is super important, and time is of the essence, while animal control can probably transfer you to an ambulance, an emergency is by very nature time sensitive, and any time wasted can lead to worse outcomes.
While carrier bundles from the OEMs can handle this, the global ability to take any phone, from any country and call an emergency number is so important, that relying on a country-by-country approach here won’t suffice.
What could we do as an industry to address this?
Acknowledging that not all countries have a single point of contact for emergency service, introducing a simple mechanism in the UE SIP message to indicate what number (Emergency Calling Code) the user actually dialled would be invaluable here.
URNs are important, but knowing the dialed number when it comes to PSAP routing, is so important – This wouldn’t even need to be its own SIP header, it could just be thrown into the Contact header as another parameter.
Highly developed markets are often the first to embrace new tech (for us this means VoLTE and VoNR), but this means that these issues seen by less developed markets won’t appear until long after the standard has been set in stone, and often countries like this aren’t at the table of the standards bodies to discuss such requirements.
This easy, reasonable update to the standard, has the potential to save lives, and next time this comes up in a working group I’ll be advocating for a change.
The other day I found myself banging my head on the table to diagnose an issue with Ringback tone on an SS7 link and the IMS.
On the IMS side, no RBT was heard, but I could see the Media Gateway was sending RTP packets to the TAS, and the TAS was sending it to the UE, but was there actual content in the RTP packets or was it just silence?
If this was PCM / G711 we’d be able to just playback in Wireshark, but alas we can’t do this for the AMR codec.
Filter the RTP stream out in Wireshark
Inside Wireshark I filtered each of the audio streams in one direction (one for the A-Party audio and one for the B-Party audio)
Then I needed to save each of the streams as a separate PCAP file (Not PCAPng).
Advanced Mobile Location (AML) is being rolled out by a large number of mobile network operators to provide accurate caller location to emergency services, so how does it work, what’s going on and what do you need to know?
Recently we’ve been doing a lot of work on emergency calling in IMS, and meeting requirements for NG-112 / e911, etc.
This led me to seeing my first Advanced Mobile Location (AML) SMS in the wild.
For those unfamiliar, AML is a fancy text message that contains the callers location, accuracy, etc, that is passed to emergency services when you make a call to emergency services in some countries.
It’s sent automatically by your handset (if enabled) when making a call to an emergency number, and it provides the dispatch operator with your location information, including extra metadata like the accuracy of the location information, height / floor if known, and level of confidence.
Google has their own version of AML called ELS, which they claim is supported on more than 99% of Android phones (I’m unclear on what this means for Harmony OS or other non-Google backed forks of Android), and Apple support for AML starts from iOS 11 onwards, meaning it’s supported on iPhones from the iPhone 5S onards,.
Call Flow
When a call is made to the PSAP based on the Emergency Calling Codes set on the SIM card or set in the OS, the handset starts collecting location information. The phone can pull this from a variety of sources, such as WiFi SSIDs visible, but the best is going to be GPS or one of it’s siblings (GLONASS / Galileo).
Once the handset has a good “lock” of a location (or if 20 seconds has passed since the call started) it bundles up all of this information the phone has, into an SMS and sends it to the PSAP as a regular old SMS.
The routing from the operator’s SMSc to the PSAP, and the routing from the PSAP to the dispatcher screen of the operator taking the call, is all up to implementation. For the most part the SMS destination is the emergency number (911 / 112) but again, this is dependent on the country.
Inside the SMS
To the user, the AML SMS is not seen, in fact, it’s actually forbidden by the standard to show in the “sent” items list in the SMS client.
On the wire, the SMS looks like any regular SMS, it can use GSM7 bit encoding as it doesn’t require any special characters.
Each attribute is a key / value pair, with semicolons (;) delineating the individual attributes, and = separating the key and the value.
If you’ve got a few years of staring at Wireshark traces in Hex under your belt, then this will probably be pretty easy to get the gist of what’s going on, we’ve got the header (A”ML=1″) which denotes this is AML and the version is 1.
After that we have the latitude (lt=), longitude (lg=), radius (rd=), time of positioning (top=), level of confidence (lc=), positioning method (pm=) with G for GNSS, W for Wifi signal, C for Cell or N for a position was not available, and so on.
AML outside the ordinary
Roaming Scenarios
If an emergency occurs inside my house, there’s a good chance I know the address, and even if I don’t know my own address, it’s probably linked to the account holder information from my telco anyway.
AML and location reporting for emergency calls is primarily relied upon in scenarios where the caller doesn’t know where they’re calling from, and a good example of this would be a call made while roaming.
If I were in a different country, there’s a much higher likelihood that I wouldn’t know my exact address, however AML does not currently work across borders.
The standard suggests disabling SMS when roaming, which is not that surprising considering the current state of SMS transport.
Without a SIM?
Without a SIM in the phone, calls can still be made to emergency services, however SMS cannot be sent.
That’s because the emergency calling standards for unauthenticated emergency calls, only cater for
This is a limitation however this could be addressed by 3GPP in future releases if there is sufficient need.
HTTPS Delivery
The standard was revised to allow HTTPS as the delivery method for AML, for example, the below POST contains the same data encoded for use in a HTTP transaction:
Implementation of this approach is however more complex, and leads to little benefit.
The operator must zero-rate the DNS, to allow the FQDN for this to be resolved (it resolves to a different domain in each country), and allow traffic to this endpoint even if the customer has data disabled (see what happens when your handset has PS Data Off ), or has run out of data.
Due to the EU’s stance on Net Neutrality, “Zero Rating” is a controversial topic that means most operators have limited implementation of this, so most fall back to SMS.
Other methods for sharing location of emergency calls?
In some upcoming posts we’ll look at the GMLC used for E911 Phase 2, and how the network can request the location from the handset.
I had a question recently on LinkedIn regarding how to preference Voice over WiFi traffic so that a network engineer operating the WiFi network can ensure the best quality of experience for Voice over WiFi.
Voice over WiFi is underpinned by the ePDG – Evolved Packet Data Gateway (this is a fancy IPsec tunnel we authenticate to using the SIM to drop our traffic into the P-CSCF over an unsecured connection). To someone operating a WiFi network, the question is how do we prioritise the traffic to the ePDGs and profile it?
ePDGs can be easily discovered through a simple DNS lookup, once you know the Mobile Network Code and Mobile Country code of the operators you want to prioritise, you can find the IPs really easily.
ePDG addresses take the form epdg.epc.mncXXX.mccYYY.pub.3gppnetwork.org so let’s look at finding the IPs for each of these for the operators in a country:
The first step is nailing down the mobile network code and mobile country codes of the operators you want to target, Wikipedia is a great source for this information. Here in Australia we have the Mobile Country Code 505 and the big 3 operators all support Voice over WiFi, so let’s look at how we’d find the IPs for each. Telstra has mobile network code (MNC) 01, in 3GPP DNS we always pad network codes to 3 digits, so that’ll be 001, and the mobile country code (MCC) for Australia is 505. So to find the IPs for Telstra we’d run an nslookup for epdg.epc.mnc001.mcc505.pub.3gppnetwork.org – The list of IPs that are returned, are the IPs you’ll see Voice over WiFi traffic going to, and the IPs you should provide higher priority to:
The same rules apply in other countries, you’d just need to update the MNC/MCC to match the operators in your country, do an nslookup and prioritise those IPs.
Generally these IPs are pretty static, but there will need to be a certain level of maintenance required to keep this list up to date by rechecking.
There’s old joke about standards that the great thing about standards there’s so many to choose from.
SMS wasn’t there from the start of GSM, but within a year of the inception of 2G we had SMS, and we’ve had SMS, almost totally unchanged, ever since.
In a recent Twitter exchange, I was asked, what’s the best way to transport SMS? As always the answer is “it depends” so let’s take a look together at where we’ve come from, where we are now, and how we should move forward.
How we got Here
Between 2G and 3G SMS didn’t change at all, but the introduction of 4G (LTE) caused a bit of a rethink regarding SMS transport.
Early builders of LTE (4G) networks launched their 4G offerings without 4G Voice support (VoLTE), with the idea that networks would “fall back” to using 2G/3G for voice calls.
This meant users got fast data, but to make or receive a call they relied on falling back to the circuit switched (2G/3G) network – Hence the name Circuit Switched Fallback.
Falling back to the 2G/3G network for a call was one thing, but some smart minds realised that if a phone had to fall back to a 2G/3G network every time a subscriber sent a text (not just calls) – And keep in mind this was ~2010 when SMS traffic was crazy high; then that would put a huge amount of strain on the 2G/3G layers as subs constantly flip-flopped between them.
The SGs-AP interface has two purposes; One, It can tell a phone on 4G to fallback to 2G/3G when it’s got an incoming call, and two; it can send and receive SMS.
SMS traffic over this interface is sometimes described as SMS-over-NAS, as it’s transported over a signaling channel to the UE.
This also worked when roaming, as the MSC from the 2G/3G network was still used, so SMS delivery worked the same when roaming as if you were in the home 2G/3G network.
Enter VoLTE & IMS
Of course when VoLTE entered the scene, it also came with it’s own option for delivering SMS to users, using IP, rather than the NAS signaling. This removed the reliance on a link to a 2G/3G core (MSC) to make calls and send texts.
This was great because it allowed operators to build networks without any 2G/3G network elements and build a fully standalone LTE only network, like Jio, Rakuten, etc.
In roaming scenarios, S8 Home Routing for VoLTE enabled SMS to be handled when roaming the same way as voice calls, which made SMS roaming a doddle.
4G SMS: SMS over IP vs SMS over NAS
So if you’re operating a 4G network, should you deliver your SMS traffic using SMS-over-IP or SMS-over-NAS?
Generally, if you’ve been evolving your network over the years, you’ve got an MSC and a 2G/3G network, you still may do CSFB so you’ve probably ended up using SMS over NAS using the SGs-AP interface. This method still relies on “the old ways” to work, which is fine until a discussion starts around sunsetting the 2G/3G networks, when you’d need to move calling to VoLTE, and SMS over NAS is a bit of a mess when it comes to roaming.
Greenfield operators generally opt for SMS over IP from the start, but this has its own limitations; SMS over IP is has awful efficiency which makes it unsuitable for use with NB-IoT applications which are bandwidth constrained, support for SMS over IP is generally limited to more expensive chipsets, so the bargain basement chips used for IoT often don’t support SMS over IP either, and integration of VoLTE comes with its own set of challenges regarding VoLTE enablement.
5G enters the scene (Nsmsf_SMService)
5G rolled onto the scene with the opportunity to remove the SMS over NAS option, and rely purely on SMS over IP (IMS); forcing the industry to standardise on an option alas this did not happen.
This added another option for SMS delivery dependent on the access network used, and the Nsmsf_SMService interface does not support roaming.
Of course if you are using Voice over NR (VoNR) then like VoLTE, SMS is carried in a SIP message to the IMS, so this negates the need for the Nsmsf_SMService.
2G/3G Shutdown – Diameter to replace SGs-AP (SGd)
With the 2G/3G shutdown in the US operators who had up until this point been relying on SMS-over-NAS using the SGs-AP interface back to their MSCs were forced to make a decision on how to route SMS traffic, after the MSCs were shut down.
This landed with SMS-over-Diameter, where the 4G core (MME) communicates over Diameter with the SMSc.
This has adoption by all the US operators, but we’re not seeing it so widely deployed in the rest of the world.
State of Play
Option
Conditions
Notes
MAP
2G/3G Only
Relies on SS7 signaling and is very old Supports roaming
SGs-AP (SMS-over-NAS)
4G only relies on 2G/3G
Needs an MSC to be present in the network (generally because you have a 2G/3G network and have not deployed VoLTE) Supports limited roaming
SMS over IP (IMS)
4G / 5G
Not supported on 2G/3G networks Relies on a IMS enabled handset and network Supports roaming in all S8 Home Routed scenarios Device support limited, especially for IoT devices
Diameter SGd
4G only / 5G NSA
Only works on 4G or 5G NSA Better device support than 4G/5G Supports roaming in some scenarios
Nsmsf_SMService
5G standalone only
Only works on 5GC Doesn’t support roaming
The convoluted world of SMS delivery options
A Way Forward:
While the SMS payload hasn’t changed in the past 31 years, how it is transported has opened up a lot of potential options for operators to use, with no clear winner, while SMS revenues and traffic volumes have continued to fall.
For better or worse, the industry needs to accept that SMS over NAS is an option to use when there is no IMS, and that in order to decommission 2G/3G networks, IMS needs to be embraced, and so SMS over IP (IMS) supported in all future networks, seems like the simple logical answer to move forward.
And with that clear path forward, we add in another wildcard…
But, when you’ve only got a finite resource of bandwidth, and massive latencies to contend with, the all-IP architecture of IMS (VoLTE / VoNR) and it’s woeful inefficiency starts to really sting.
Of course there are potential workarounds here, Robust Header Correction (ROHC) can shrink this down, but it’s still going to rely on the 3 way handshake of TCP, TCP keepalive timers and IMS registrations, which in turn can starve the radio resources of the satellite link.
Even with SMS over 30 years old, we can still expect it to be a part of networks for years to come, even as WhatsApp / iMessage, etc, offer enhanced services. As to how it’s transported and the myriad of options here, I’m expecting that we’ll keep seeing a multi-transport mix long into the future.
For simple, cut-and-dried 4G/5G only network, IMS and SMS over IP makes the most sense, but for anything outside of that, you’ve got a toolbox of options for use to make a solution that best meets your needs.
The Binding Support Function is used in 4G and 5G networks to allow applications to authenticate against the network, it’s what we use to authenticate for XCAP and for an Entitlement Server.
Rather irritatingly, there are two BSF addresses in use:
If the ISIM is used for bootstrapping the FQDN to use is:
bsf.ims.mncXXX.mccYYY.pub.3gppnetwork.org
But if the USIM is used for bootstrapping the FQDN is
bsf.mncXXX.mccYYY.pub.3gppnetwork.org
You can override this by setting the 6FDA EF_GBANL (GBA NAF List) on the USIM or equivalent on the ISIM, however not all devices honour this from my testing.
We recently added support in PyHSS for fixed line SIP subscribers to attach to the IMS.
Traditional telecom operators are finding their fixed line network to be a bit of a money pit, something they’re required to keep operating to meet regulatory obligations, but the switches are sitting idle 99% of the time. As such we’re seeing more and more operators move fixed line subs onto their IMS.
This new feature means we can use PyHSS to serve as the brains for a fixed network, as well as for mobile, but there’s one catch – How we authenticate subscribers changes.
Most banks of line cards in a legacy telecom switches, or IP Phones, don’t have SIM slots to allow us to authenticate, so instead we’re forced to fallback to what they do support.
Unfortunately for the most part, what is supported by these IP phones or telecom switches is SIP MD5 Digest Authentication.
The Nonce is generated by the HSS and put into the Multimedia-Authentication-Answer, along with the subscriber’s password and sent in the clear to the S-CSCF.
The HSS then generates the the Multimedia-Auth Answer, it generates a nonce (in the 3GPP-SIP-Authenticate / 609 AVP) and sends the Subscriber’s password in the 3GPP-SIP-Authorization (610) AVP in response back to the S-CSCF.
I would have thought a better option would be for the HSS to generate the Nonce and Digest, and then the S-CSCF to just send the Nonce to the Sub and compare the returned Digest from the Sub against the expected Digest from the HSS, but it would limit flexibility (realm adaptation, etc) I guess.
The UE/UA (I guess it’s a UA in this context as it’s not a mobile) then generates its own Digest from the Nonce and sends it back to the S-CSCF via the P-CSCF.
The S-CSCF compares the received Digest response against the one it generated, and if the two match, the sub is authenticated and allowed to attach onto the network.
In the past I had my iFCs setup to look for the P-Access-Network-Info header to know if the call was coming from the IMS, but it wasn’t foolproof – Fixed line IMS subs didn’t have this header.
If you’ve ever worked in roaming, you’ll probably have had the misfortune of dealing with Transferred Account Procedures aka TAP files.
It’s used for billing a 2G GSM call right up to 5G data usage, if you use a service while roaming, somewhere in the world there’s a TAP file with your usage in it.
A brief history of TAP
TAP was originally specified by the GSMA in 1991 as a standard CDR interchange format between operators, for use in roaming scenarios.
Notice I said GSMA – Not 3GPP – This means there’s no 3GPP TS docs for this, it’s defined by the industry lobby group’s members, rather than the standards body.
So what does this actually mean? Well, if you’re MNO A and a customer from MNO B roams into your network, all the calls, SMS and data consumed by the roaming subscriber from MNO B will need to be billed to MNO B, by you, MNO A.
If a network operator wants to get paid for traffic used on their network by roaming subscribers, they’d better send out a TAP file to the roamer’s home network.
TAP is the file format generated my MNO A and sent to MNO B, containing all the usage charges that subscribers from MNO B have racked up while roaming into your network.
These are broken down into “Transactions” (CDRs), for events like making a call, connecting a PDN session and consuming data, or sending a text.
In the beginning of time, GSM provided only voice calling service. This meant that the only services a subscriber could consume while roaming was just making/receiving voice calls which were billed at the end of each month. – This meant billing was equally simple, every so often the visisted network would send the TAP files for the voice calls made by subscribers visited other networks, to the home networks, which would markup those charges, and add them onto the monthly invoice for each subscriber who was roaming.
But of course today, calling accounts for a tiny amount of usage on the network, but this happened gradually while passing through the introduction of SMS, CAMEL services, prepaid services, mobile data, etc. For all these services that could be offered, the TAP format had to evolve to handle each of these scenarios.
As we move towards a flat IP architecture, where voice calls and SMS sent while roaming are just data, TAP files for 4G and 5G networks only need to show data transactions, so the call objects, CAMEL parameters and SMS objects are all falling by the wayside.
What’s inside a TAP File
TAP uses the most beloved of formats – ASN1 to encode the data. This means it is strictly formatted and rigidly specified.
Each file contains a Sequence Number which is a monotonically increasing number, which allows the receiver to know if any files have been missed between the file that’s being currently parsed, an the previous file.
They also have a recipient and sender TADIG code, which is a code allocated by GSMA that uniquely identifies the sender and the recipient of the file.
The TAP records exist in one of two common format, Notification Records and transferBatch records.
These files are exchanged between operators, in practice this means “Dumped on an FTP server as agreed between the two”.
TAP Notification Records
Notifications are the simplest of TAP records and are used when there aren’t any CDRs for roaming events during the time period the TAP file covers.
These are essentially blank TAP files generated by the visited network to let the home network know it’s still there, but there are no roaming subs consuming services in that period.
Notification files are really simple, let’s take a look as one shown as JSON:
When we have services to bill and records to charge, that’s when instead we generate a transferBatch record.
It looks something like this:
There’s a lot going on in here, so let’s break it down section by section.
accountingInfo
The accountingInfo section specifies the currency, exchange rate parameters.
Keep in mind a TAP record generated by an operator in the US, would use USD, while the receiver of the file may be a European MNO dealing in EUR.
This gets even more complicated if you’re dealing with more obscure currencies where an intermediary currency is used, that’s where we bring in SDRs (“Special Drawing Right”) that map to the dollar value to be charged, kinda – the roaming agreement defines how many SDRs are in a dollar, in the example below we’re not using any, but you do see it.
When it comes to numbers and decimal places, TAP doesn’t exactly make it easy.
Significant Digits are defined by counting the first number before the decimal point and all the numbers to the right of the decimal point, so for example the number 1.234 would be 4 significant digits (1 digit before the decimal point and 3 digits after it).
Decimal Places are not actually supported for the Value fields in the TAP file. This is tricky because especially today when roaming tariffs are quite low, these values can be quite small, and we need to represent them as an integer number. TAP defines decimal places as the number of digits after the decimal place.
When it comes to the maximum number of decimal places, this actually impacts the maximum number we can store in the field – as ASN1 strictly enforce what we put in it.
The auditControlInfo section contains the number of CDRs (callEventDetailsCount) contained in the TAP file, the timestamp of the first and last CDR in the file, the total charge and any tax charged.
All of the currency information was provided in the accountingInfo so this is just giving us our totals.
A CDR has 30 days from the time it was generated / service consumed by the roamer, to be baked into a TAP file. After this we can no longer charge for it, so it’s important that the earliestCallTimeStamp is not more than 30 days before the fileCreationTimeStamp seen in batchControlInfo.
batchControlInfo
The batchControlInfo section specifies the time the TAP file became available for transfer, the time the file was created (usually the same), the sequence number and the sender / recipient TADIG codes.
As mentioned earlier, we track sequence number so the receiver can know if a TAP file has been missed; for example if you’ve got TAP file 1 and TAP file 3 comes in, you can determine you’ve missed TAP file 2.
Now we’re getting to the meat & potatoes of our TAP record, the CDRs themselves.
In LTE networks these are just records of data consumption, so let’s take a look inside the gprsCall records under callEventDetails:
In the gprsBasicCallInformation we’ve got as the name suggests the basic info about the data usage event. The time when the session started, the charging ID, the IMSI and the MSISDN of the subscriber to charge, along with their IP and the APN used.
Next up we have the gprsLocationInformation – rates and tariffs may be set based on the location of the subscriber, so we need to identify the area the sub was using the services to select correct tariff / rate for traffic in this destination.
The recEntity is the index number of the SGW / PGW used for the transaction (more on that later).
Next we have the gprsServiceUsed which, again as the name suggests, details the services used and the charge.
chargeDetailList contains the charged data (Made up of dataVolumeIncoming + dataVolumeOutgoing) and the cost.
The chargeableUnits indicates the actual data consumed, however most roaming agreements will standardise on some level of rounding, for example rounding up to the nearest Kilobyte (1024 bytes), so while a sub may consume 1025 bytes of data, they’d be billed for 2045 bytes of data. The data consumed is indicated in the chargeableUnits which indicates how much data was actually consumed, before any rounding policies where applied, while the amount that is actually charged (When taking into account rounding policies) isindicated inside Charged Units.
In the example below data usage is rounded up to the nearest 1024 bytes, 134390 bytes rounds up to the nearest 1024 gives you 135168 bytes.
As this is data we’re talking bytes, but not all bytes are created equal!
VoLTE traffic, using a QCI1 bearer is more valuable than QCI 9 cat videos, and TAP records take this into account in the Call Type Groups, each of which has a different price – Call Type Level 1 indicates the type of traffic, for S8 Home Routed LTE Traffic this is 10 (HGGSN/HP-GW), while Call Type Level 2 indicates the type of traffic as mapped to QCI values:
So Call Type Level 2 set to 20 indicates that this is “20 Unspecified/default LTE QCIs”, and Call Type Level 3 can be set to any value based on a defined inter-operator tariff.
recEntityType 7 means a PGW and contains the IP of the PGW in the Home PLMN, while recEntityType 8 means SGW and is the SGW in the Visited PLMN.
So this means if we reference recEntityCode 2 in a gprsCall, that we’re referring to an SGW at 1.2.3.5.
Lastly also got the utcTimeOffsetInfo to indicate the timezones used and assign a unique code to it.
Using the Records
We as humans? These records aren’t meant for us.
They’re designed to be generated by the Visited PLMN and sent to to the home PLMN, which ingests it and pays the amount specified in the time agreed.
Generally this is an FTP server that the TAP records get dumped into, and an automated bank transfer job based on the totals for the TAP records.
Testing of the TAP records is called “TADIG Testing” and it’s something we’ll go into another day, but in essence it’s validating that the output and contents of the files meet what both operators think is the contract pricing and specifications.
So that’s it! That’s what’s in a TAP record, what it does and how we use it!
GSMA are introducing BCE – Billing & Charging Evolution, a new standard, designed to last for the next 30+ years like TAP has. It’s still in its early days, but that’s the direction the GSMA has indicated it would like to go.
Everything was working on the IMS, then I go to bed, the next morning I fire up the test device and it just won’t authenticate to the IMS – The S-CSCF generated a 401 in response to the REGISTER, but the next REGISTER wouldn’t pass.
When we generate the vectors (for IMS auth and standard auth) one of the inputs to generate the vectors is the Sequence Number or SQN.
This SQN ticks over like an odometer for the number of times the SIM / HSS authentication process has been performed.
There is some leeway in the SQN – It may not always match between the SIM and the HSS and that’s to be expected. When the MME sends an Authentication-Information-Request it can ask for multiple vectors so it’s got some in reserve for the next time the subscriber attaches, and that’s allowed.
But there are limits to how far out our SQN can be, and for good reason – One of the key purposes for the SQN is to protect against replay attacks, where the same vector is replayed to the UE. So the SQN on the HSS can be ahead of the SIM (within reason), but it can’t be behind – Odometers don’t go backwards.
So the issue was with the SQN on the SIM being out of Sync with the SQN in the IMS, how do we know this is the case, and how do we fix this?
Well there is a resync mechanism so the SIM can securely tell the HSS what the current SQN it is using, so the HSS can update it’s SQN.
When verifying the AUTN, the client may detect that the sequence numbers between the client and the server have fallen out of sync. In this case, the client produces a synchronization parameter AUTS, using the shared secret K and the client sequence number SQN. The AUTS parameter is delivered to the network in the authentication response, and the authentication can be tried again based on authentication vectors generated with the synchronized sequence number.
In our example we can tell the sub is out of sync as in our Multimedia Authentication Request we see the SIP-Authorization AVP, which contains the AUTS (client synchronization parameter) which the SIM generated and the UE sent back to the S-CSCF. Our HSS can use the AUTS value to determine the correct SQN.
SIP-Authorization AVP in the Multimedia Authentication Request means the SQN is out of Sync and this AVP contains the RAND and AUTN required to Resync
Note: The SIP-Authorization AVP actually contains both the RAND and the AUTN concatenated together, so in the above example the first 32 bytes are the AUTN value, and the last 32 bytes are the RAND value.
So the HSS gets the AUTS and from it is able to calculate the correct SQN to use.
Then the HSS just generates a new Multimedia Authentication Answer with a new vector using the correct SQN, sends it back to the IMS and presto, the UE can respond to the challenge normally.
We’re doing more and more network automation, and something that came up as valuable to us would be to have all the IPs in HOMER SIP Capture come up as the hostnames of the VM running the service.
Luckily for us HOMER has an API for this ready to roll, and best of all, it’s Swagger based and easily documented (awesome!).
(Probably through my own failure to properly RTFM) I was struggling to work out the correct (current) way to Authenticate against the API service using a username and password.
Because the HOMER team are awesome however, the web UI for HOMER, is just an API client.
This means to look at how to log into the API, I just needed to fire up Wireshark, log into the Web UI via my browser and then flick through the packets for a real world example of how to do this.
Homer Login JSON body as seen by Wireshark
In the Login action I could see the browser posts a JSON body with the username and password to /api/v3/auth
And in return the Homer API Server responds with a 201 Created an a auth token back:
Now in order to use the API we just need to include that token in our Authorization: header then we can hit all the API endpoints we want!
For me, the goal we were setting out to achieve was to setup the aliases from our automatically populated list of hosts. So using the info above I setup a simple Python script with Requests to achieve this:
import requests
s = requests.Session()
#Login and get Token
url = 'http://homer:9080/api/v3/auth'
json_data = {"username":"admin","password":"sipcapture"}
x = s.post(url, json = json_data)
print(x.content)
token = x.json()['token']
print("Token is: " + str(token))
#Add new Alias
alias_json = {
"alias": "Blog Example",
"captureID": "0",
"id": 0,
"ip": "1.2.3.4",
"mask": 32,
"port": 5060,
"status": True
}
x = s.post('http://homer:9080/api/v3/alias', json = alias_json, headers={'Authorization': 'Bearer ' + token})
print(x.status_code)
print(x.content)
#Print all Aliases
x = s.get('http://homer:9080/api/v3/alias', headers={'Authorization': 'Bearer ' + token})
print(x.json())
And bingo we’re done, a new alias defined.
We wrapped this up in a for loop for each of the hosts / subnets we use and hooked it into our build system and away we go!
With the Homer API the world is your oyster in terms of functionality, all the features of the Web UI are exposed on the API as the Web UI just uses the API (something I wish was more common!).
Using the Swagger based API docs you can see examples of how to achieve everything you need to, and if you ever get stuck, just fire up Wireshark and do it in the Homer WebUI for an example of how the bodies should look.
Misunderstood, under appreciated and more capable than people give it credit for, is our PCRF.
But what does it do?
Most folks describe the PCRF in hand wavy-terms – “it does policy and charging” is the answer you’ll get, but that doesn’t really tell you anything.
So let’s answer it in a way that hopefully makes some practical sense, starting with the acronym “PCRF” itself, it stands for Policy and Charging Rules Function, which is kind of two functions, one for policy and one for rules, so let’s take a look at both.
Policy
In cellular world, as in law, policy is the rules.
For us some examples of policy could be a “fair use policy” to limit customer usage to acceptable levels, but it can also be promotional packages, services like “free Spotify” packages, “Voice call priority” or “unmetered access to Nick’s Blog and maximum priority” packages, can be offered to customers.
All of these are examples of policy, and to make them work we need to target which subscribers and traffic we want to apply the policy to, and then apply the policy.
Charging Rules
Charging Rules are where the policy actually gets applied and the magic happens.
It’s where we take our policy and turn it into actionable stuff for the cellular world.
Let’s take an example of “unmetered access to Nick’s Blog and maximum priority” as something we want to offer in all our cellular plans, to provide access that doesn’t come out of your regular usage, as well as provide QCI 5 (Highest non dedicated QoS) to this traffic.
To achieve this we need to do 3 things:
Profile the traffic going to this website (so we capture this traffic and not regular other internet traffic)
Charge it differently – So it’s not coming from the subscriber’s regular balance
Up the QoS (QCI) on this traffic to ensure it’s high priority compared to the other traffic on the network
So how do we do that?
Profiling Traffic
So the first step we need to take in providing free access to this website is to filter out traffic to this website, from the traffic not going to this website.
Let’s imagine that this website is hosted on a single machine with the IP 1.2.3.4, and it serves traffic on TCP port 443. This is where IPFilterRules (aka TFTs or “Traffic Flow Templates”) and the Flow-Description AVP come into play. We’ve covered this in the past here, but let’s recap:
IPFilterRules are defined in the Diameter Base Protocol (IETF RFC 6733), where we can learn the basics of encoding them,
They take the format:
action dir proto from src to dst
The action is fairly simple, for all our Dedicated Bearer needs, and the Flow-Description AVP, the action is going to be permit. We’re not blocking here.
The direction (dir) in our case is either in or out, from the perspective of the UE.
Next up is the protocol number (proto), as defined by IANA, but chances are you’ll be using 17 (UDP) or 6 (TCP).
The from value is followed by an IP address with an optional subnet mask in CIDR format, for example from 10.45.0.0/16would match everything in the 10.45.0.0/16 network.
Following from you can also specify the port you want the rule to apply to, or, a range of ports.
Like the from, the tois encoded in the same way, with either a single IP, or a subnet, and optional ports specified.
And that’s it!
So let’s create a rule that matches all traffic to our website hosted on 1.2.3.4 TCP port 443,
permit out 6 from 1.2.3.4 443 to any 1-65535
permit out 6 from any 1-65535 to 1.2.3.4 443
All this info gets put into the Flow-Information AVPs:
With the above, any traffic going to/from 1.23.4 on port 443, will match this rule (unless there’s another rule with a higher precedence value).
Charging Actions
So with our traffic profiled, the next question is what actions are we going to take, well there’s two, we’re going to provide unmetered access to the profiled traffic, and we’re going to use QCI 4 for the traffic (because you’ll need a guaranteed bit rate bearer to access!).
Charging-Group for Profiled Traffic
To allow for Zero Rating for traffic matching this rule, we’ll need to use a different Rating Group.
Let’s imagine our default rating group for data is 10000, then any normal traffic going to the OCS will use rating group 10000, and the OCS will apply the specific rates and policies based on that.
Rating Groups are defined in the OCS, and dictate what rates get applied to what Rating Groups.
For us, our default rating group will be charged at the normal rates, but we can define a rating group value of 4000, and set the OCS to provide unlimited traffic to any Credit-Control-Requests that come in with Rating Group 4000.
This is how operators provide services like “Unlimited Facebook” for example, a Charging Rule matches the traffic to Facebook based on TFTs, and then the Rating Group is set differently to the default rating group, and the OCS just allows all traffic on that rating group, regardless of how much is consumed.
Inside our Charging-Rule-Definition, we populate the Rating-Group AVP to define what Rating Group we’re going to use.
Setting QoS for Profiled Traffic
The QoS Description AVP defines which QoS parameters (QCI / ARP / Guaranteed & Maximum Bandwidth) should be applied to the traffic that matches the rules we just defined.
As mentioned at the start, we’ll use QCI 4 for this traffic, and allocate MBR/GBR values for this traffic.
Putting it Together – The Charging Rule
So with our TFTs defined to match the traffic, our Rating Group to charge the traffic and our QoS to apply to the traffic, we’re ready to put the whole thing together.
So here it is, our “Free NVN” rule:
I’ve attached a PCAP of the flow to this post.
In our next post we’ll talk about how the PGW handles the installation of this rule.
One day recently I was messing with the XCAP server, trying to set the Call Forward timeout. In the process I triggered the UE to send a USSD request to the IMS.
Huh, I thought, “I wonder how hard it would be to build a USSD Gateway for our IMS?”, and this my friends, is the story of how I wasted a good chunk of my weekend trying (and failing) to add support for USSD.
You might be asking “Who still uses USSD?” – The use cases for USSD are pretty thin on the ground in this day and age, but I guess balance query, and uh…
But this is the story of what I tried before giving up and going outside…
Routing
First I’d need to get the USSD traffic towards the USSD Gateway, this means modifying iFCs. Skimming over the spec I can see the Recv-Info: header for USSD traffic should be set to “g.3gpp.ussd” so I knocked up an iFC to match that, and route the traffic to my dev USSD Gateway, and added it to the subscriber profile in PyHSS:
Easy peasy, now we have the USSD requests hitting our USSD Gateway.
The Response
I’ll admit that I didn’t jump straight to the TS doc from the start.
The first place I headed was Google to see if I could find any PCAPs of USSD over IMS/SIP.
And I did – Restcomm seems to have had a USSD product a few years back, and trawling around their stuff provided some reference PCAPs of USSD over SIP.
So the flow seemed pretty simple, SIP INVITE to set up the session, SIP INFO for in-dialog responses and a BYE at the end.
With all the USSD guts transferred as XML bodies, in a way that’s pretty easy to understand.
Being a Kamailio fan, that’s the first place I started, but quickly realised that SIP proxies, aren’t great at acting as the UAS.
So I needed to generate in-dialog SIP INFO messages, so I turned to the UAC module to generate the SIP INFO response.
My Kamailio code is super simple, but let’s have a look:
request_route {
xlog("Request $rm from $fU");
if(is_method("INVITE")){
xlog("USSD from $fU to $rU (Emergency number) CSeq is $cs ");
sl_reply("200", "OK Trying USSD Phase 1"); #Generate 200 OK
route("USSD_Response"); #Call USSD_Response route block
exit;
}
}
route["USSD_Response"]{
xlog("USSD_Response Route");
#Generate a new UAC Request
$uac_req(method)="INFO";
$uac_req(ruri)=$fu; #Copy From URI to Request URI
$uac_req(furi)=$tu; #Copy To URI to From URI
$uac_req(turi)=$fu; #Copy From URI to To URI
$uac_req(callid)=$ci; #Copy Call-ID
#Set Content Type to 3GPP USSD
$uac_req(hdrs)=$uac_req(hdrs) + "Content-Type: application/vnd.3gpp.ussd+xml\r\n";
#Set the USSD XML Response body
$uac_req(body)="<?xml version='1.0' encoding='UTF-8'?>
<ussd-data>
<language value=\"en\"/>
<ussd-string value=\"Bienvenido. Seleccione una opcion: 1 o 2.\"/>
</ussd-data>";
$uac_req(evroute)=1; #Set the event route to use on return replies
uac_req_send(); #Send it!
}
So the UAC module generates the 200 OK and sends it back.
“That was quick” I told myself, patting myself on the back before trying it out for the first time.
Huston, we have a problem – Although the Call-ID is the same, it’s not an in-dialog response as the tags aren’t present, this means our UE send back a 405 to the SIP INFO.
Right. Perhaps this is the time to read the Spec…
Okay, so the SIP INFO needs to be in dialog. Can we do that with the UAC module? Perhaps not…
But alas real life came back to rear its ugly head, and this adventure will have to continue another day…
Update: Thanks to a kindly provided PCAP I now know what I was doing wrong, and so we’ll soon have a follow up to this post named “Successes in cobbling together a USSD Gateway” just as soon as I have a weekend free.
Recently I’ve been working on open source Diameter Routing Agent implementations (See my posts on FreeDiameter).
With the hurdles to getting a DRA working with open source software covered, the next step was to get all my Diameter traffic routed via the DRAs, however I soon rediscovered a Kamailio limitation regarding support for Diameter Routing Agents.
You see, when Kamailio’s C Diameter Peer module makes a decision as to where to route a request, it looks for the active Diameter peers, and finds a peer with the suitable Vendor and Application IDs in the supported Applications for the Application needed.
Unfortunately, a DRA typically only advertises support for one application – Relay.
This means if you have everything connected via a DRA, Kamailio’s CDP module doesn’t see the Application / Vendor ID for the Diameter application on the DRA, and doesn’t route the traffic to the DRA.
The fix for this was twofold, the first step was to add some logic into Kamailio to determine if the Relay application was advertised in the Capabilities Exchange Request / Answer of the Diameter Peer.
I added the logic to do this and exposed this so you can see if the peer supports Diameter relay when you run “cdp.list_peers”.
With that out of the way, next step was to update the routing logic to not just reject the candidate peer if the Application / Vendor ID for the required application was missing, but to evaluate if the peer supports Diameter Relay, and if it does, keep it in the game.
I added this functionality, and now I’m able to use CDP Peers in Kamailio to allow my P-CSCF, S-CSCF and I-CSCF to route their traffic via a Diameter Routing Agent.
Next we’ll need to define our rt_pyform config, this is a super simple 3 line config file that specifies the path of what we’re doing:
DirectoryPath = "." # Directory to search
ModuleName = "script" # Name of python file. Note there is no .py extension
FunctionName = "transform" # Python function to call
The DirectoryPath directive specifies where we should search for the Python code, and ModuleName is the name of the Python script, lastly we have FunctionName which is the name of the Python function that does the rewriting.
Now let’s write our Python function for the transformation.
The Python function much have the correct number of parameters, must return a string, and must use the name specified in the config.
The following is an example of a function that prints out all the values it receives:
Note the order of the arguments and that return is of the same type as the AVP value (string).
We can expand upon this and add conditionals, let’s take a look at some more complex examples:
def transform(appId, flags, cmdCode, HBH_ID, E2E_ID, AVP_Code, vendorID, value):
print('[PYTHON]')
print(f'|-> appId: {appId}')
print(f'|-> flags: {hex(flags)}')
print(f'|-> cmdCode: {cmdCode}')
print(f'|-> HBH_ID: {hex(HBH_ID)}')
print(f'|-> E2E_ID: {hex(E2E_ID)}')
print(f'|-> AVP_Code: {AVP_Code}')
print(f'|-> vendorID: {vendorID}')
print(f'|-> value: {value}')
#IMSI Translation - if App ID = 16777251 and the AVP being evaluated is the Username
if (int(appId) == 16777251) and int(AVP_Code) == 1:
print("This is IMSI '" + str(value) + "' - Evaluating transformation")
print("Original value: " + str(value))
value = str(value[::-1]).zfill(15)
The above look at if the App ID is S6a, and the AVP being checked is AVP Code 1 (Username / IMSI ) and if so, reverses the username, so IMSI 1234567 becomes 7654321, the zfill is just to pad with leading 0s if required.
Now let’s do another one for a Realm Rewrite:
def transform(appId, flags, cmdCode, HBH_ID, E2E_ID, AVP_Code, vendorID, value):
#Print Debug Info
print('[PYTHON]')
print(f'|-> appId: {appId}')
print(f'|-> flags: {hex(flags)}')
print(f'|-> cmdCode: {cmdCode}')
print(f'|-> HBH_ID: {hex(HBH_ID)}')
print(f'|-> E2E_ID: {hex(E2E_ID)}')
print(f'|-> AVP_Code: {AVP_Code}')
print(f'|-> vendorID: {vendorID}')
print(f'|-> value: {value}')
#Realm Translation
if int(AVP_Code) == 283:
print("This is Destination Realm '" + str(value) + "' - Evaluating transformation")
if value == "epc.mnc001.mcc001.3gppnetwork.org":
new_realm = "epc.mnc999.mcc999.3gppnetwork.org"
print("translating from " + str(value) + " to " + str(new_realm))
value = new_realm
else:
#If the Realm doesn't match the above conditions, then don't change anything
print("No modification made to Realm as conditions not met")
print("Updated Value: " + str(value))
In the above block if the Realm is set to epc.mnc001.mcc001.3gppnetwork.org it is rewritten to epc.mnc999.mcc999.3gppnetwork.org, hopefully you can get a handle on the sorts of transformations we can do with this – We can translate any string type AVPs, which allows for hostname, realm, IMSI, Sh-User-Data, Location-Info, etc, etc, to be rewritten.
Having a central pair of Diameter routing agents allows us to drastically simplify our network, but what if we want to perform some translations on AVPs?
For starters, what is an AVP transformation? Well it’s simply rewriting the value of an AVP as the Diameter Request/Response passes through the DRA. A request may come into the DRA with IMSI xxxxxx and leave with IMSI yyyyyy if a translation is applied.
So why would we want to do this?
Well, what if we purchased another operator who used Realm X, and we use Realm Y, and we want to link the two networks, then we’d need to rewrite Realm Y to Realm X, and Realm X to Realm Y when they communicate, AVP transformations allow for this.
If we’re an MVNO with hosted IMSIs from an MNO, but want to keep just the one IMSI in our HSS/OCS, we can translate from the MNO hosted IMSI to our internal IMSI, using AVP transformations.
If our OCS supports only one rating group, and we want to rewrite all rating groups to that one value, AVP transformations cover this too.
There are lots of uses for this, and if you’ve worked with a bit of signaling before you’ll know that quite often these sorts of use-cases come up.
So how do we do this with freeDiameter?
To handle this I developed a module for passing each AVP to a Python function, which can then apply any transformation to a text based value, using every tool available to you in Python.
In the next post I’ll introduce rt_pyform and how we can use it with Python to translate Diameter AVPs.
Way back in part 2 we discussed the basic routing logic a DRA handles, but what if we want to do something a bit outside of the box in terms of how we route?
For me, one of the most useful use cases for a DRA is to route traffic based on IMSI / Username. This means I can route all the traffic for MVNO X to MVNO X’s HSS, or for staging / test subs to the test HSS enviroment.
FreeDiameter has a bunch of built in logic that handles routing based on a weight, but we can override this, using the rt_default module.
In our last post we had this module commented out, but let’s uncomment it and start playing with it:
In the above code we’ve uncommented rt_default and rt_redirect.
You’ll notice that rt_default references a config file, so we’ll create a new file in our /etc/freeDiameter directory called rt_default.conf, and this is where the magic will happen.
A few points before we get started:
This overrides the default routing priorities, but in order for a peer to be selected, it has to be in an Open (active) state
The peer still has to have advertised support for the requested application in the CER/CEA dialog
The peers will still need to have all been defined in the freeDiameter.conf file in order to be selected
So with that in mind, and the 5 peers we have defined in our config above (assuming all are connected), let’s look at some rules we can setup using rt_default.
Intro to rt_default Rules
The rt_default.conf file contains a list of rules, each rule has a criteria that if matched, will result in the specified action being taken. The actions all revolve around how to route the traffic.
So what can these criteria match on? Here’s the options:
Item to Match
Code
Any
*
Origin-Host
oh=”STR/REG”
Origin-Realm
or=”STR/REG”
Destination-Host
dh=”STR/REG”
Destination-Realm
dr=”STR/REG”
User-Name
un=”STR/REG”
Session-Id
si=”STR/REG”
rt_default Matching Criteria
We can either match based on a string or a regex, for example, if we want to match anything where the Destination-Realm is “mnc001.mcc001.3gppnetwork.org” we’d use something like:
#Low score to HSS02
dr="mnc001.mcc001.3gppnetwork.org" : dh="hss02" += -70 ;
Now you’ll notice there is some stuff after this, let’s look at that.
We’re matching anything where the destination-host is set to hss02 (that’s the bit before the colon), but what’s the bit after that?
Well if we imagine that all our Diameter peers are up, when a message comes in with Destination-Realm “mnc001.mcc001.3gppnetwork.org”, looking for an HSS, then in our example setup, we have 4 HHS instances to choose from (assuming they’re all online).
In default Diameter routing, all of these peers are in the same realm, and as they’re all HSS instances, they all support the same applications – Our request could go to any of them.
But what we set in the above example is simply the following:
If the Destination-Realm is set to mnc001.mcc001.3gppnetwork.org, then set the priority for routing to hss02 to the lowest possible value.
So that leaves the 3 other Diameter peers with a higher score than HSS02, so HSS02 won’t be used.
Let’s steer this a little more,
Let’s specify that we want to use HSS01 to handle all the requests (if it’s available), we can do that by adding a rule like this:
#Low score to HSS02
dr="mnc001.mcc001.3gppnetwork.org" : dh="hss02" += -70 ;
#High score to HSS01
dr="mnc001.mcc001.3gppnetwork.org" : dh="hss01" += 100 ;
But what if we want to route to hss-lab if the IMSI matches a specific value, well we can do that too.
#Low score to HSS02
dr="mnc001.mcc001.3gppnetwork.org" : dh="hss02" += -70 ;
#High score to HSS01
dr="mnc001.mcc001.3gppnetwork.org" : dh="hss01" += 100 ;
#Route traffic for IMSI to Lab HSS
un="001019999999999999" : dh="hss-lab" += 200 ;
Now that we’ve set an entry with a higher score than hss01 that will be matched if the username (IMSI) equals 001019999999999999, the traffic will get routed to hss-lab.
But that’s a whole IMSI, what if we want to match only part of a field?
Well, we can use regex in the Criteria as well, so let’s look at using some Regex, let’s say for example all our MVNO SIMs start with 001012xxxxxxx, let’s setup a rule to match that, and route to the MVNO HSS with a higher priority than our normal HSS:
#Low score to HSS02
dr="mnc001.mcc001.3gppnetwork.org" : dh="hss02" += -70 ;
#High score to HSS01
dr="mnc001.mcc001.3gppnetwork.org" : dh="hss01" += 100 ;#Route traffic for IMSI to Lab HSS
un="001019999999999999" : dh="hss-lab" += 200 ;
#Route traffic where IMSI starts with 001012 to MVNO HSS
un=["^001012.*"] : dh="hss-mvno-x" += 200 ;
Let’s imagine that down the line we introduce HSS03 and HSS04, and we only want to use HSS01 if HSS03 and HSS04 are unavailable, and only to use HSS02 no other HSSes are available, and we want to split the traffic 50/50 across HSS03 and HSS04.
Firstly we’d need to add HSS03 and HSS04 to our FreeDiameter.conf file:
Then in our rt_default.conf we’d need to tweak our scores again:
#Low score to HSS02
dr="mnc001.mcc001.3gppnetwork.org" : dh="hss02" += 10 ;
#Medium score to HSS01
dr="mnc001.mcc001.3gppnetwork.org" : dh="hss01" += 20 ;
#Route traffic for IMSI to Lab HSS
un="001019999999999999" : dh="hss-lab" += 200 ;
#Route traffic where IMSI starts with 001012 to MVNO HSS
un=["^001012.*"] : dh="hss-mvno-x" += 200 ;
#High Score for HSS03 and HSS04dr="mnc001.mcc001.3gppnetwork.org" : dh="hss02" += 100 ;dr="mnc001.mcc001.3gppnetwork.org" : dh="hss04" += 100 ;
One quick tip to keep your logic a bit simpler, is that we can set a variety of different values based on keywords (listed below) rather than on a weight/score:
Behaviour
Name
Score
Do not deliver to peer (set lowest priority)
NO_DELIVERY
-70
The peer is a default route for all messages
DEFAULT
5
The peer is a default route for this realm
DEFAULT_REALM
10
REALM
15
Route to the specified Host with highest priority
FINALDEST
100
Rather than manually specifying the store you can use keywords like above to set the value
In our next post we’ll look at using FreeDiameter based DRA in roaming scenarios where we route messages across Diameter Realms.
Want more telecom goodness?
I have a good old fashioned RSS feed you can subscribe to.