Carl Sagan once famously said “If you wish to make an apple pie from scratch, you must first invent the universe”, we don’t need to go that far back, but if you want to deliver an MMS to a subscriber, first you must deliver an SMS.
Wait, but we’re talking about MMS right? So why are we talking SMS?
Modern MMS transport relies on HTTP, which is client-server based, the phone / UE is the client, and the MMSc is the Server.
The problem with this client-server relationship, is the client requests things from the server, but the server can’t request things from the client.
This presents a problem when it comes to delivering the MMS – The phone / UE will need to request the MMSc provide it the message to be received, but needs to know there is a message to request in the first place.
So this is where SMS comes in. When the MMSc has a message destined for a Subscriber, it sends the phone/UE an SMS, informing that there is an MMS waiting, and providing the URL the MMS can be retrieved from.
This is typically done by MAP or SMPP, to link the MMSc to the SMSc to allow it to send these messages.
This SMS contains the URL to retrieve the MMS at, once the UE receives this SMS, it knows where to retrieve the MMS.
It can then send an HTTP GET to the URL to retrieve the MMS, and lastly sends an HTTP POST to confirm to the MMSc it retrieved it all OK.
MMS Mobile Terminated message flow
So that’s the basics, let’s look at each part of the dialog in some more detail, starting with this magic SMS to tell the UE where to retrieve the MMS from.
WAP PUSH from MMSc sent via SMS
So some things to notice, the user data, which would usually carry the body of our SMS instead contains another protocol, “Wireless Session Protocol” (WSP), and this is the method “Push”.
That in turn is followed by MMS Message Encapsulation, again inside the SMS message body, this time with the MMS specific data.
The From: header contains the sender of the MMS, this is how you can see who the MMS is from, while it’s still downloading.
The expiry indicates to the handset, it it doesn’t download the MMS within the specified time period, it shouldn’t bother, as the message will have expired.
And lastly, and perhaps most importantly, we have the X-MMS-Content-Location header, which tells our subscriber where to download the MMS from.
After this, the UE sends an HTTP GET to the URL in the X-MMS-Content-Location header (typically on the “mms” APN), to retrieve the MMS from the MMSc.
HTTP GET from the UE to the MMSc
The HTTP GET is pretty normal, there’s the usual MMS headers we talked about in the last post, and we just GET the path provided by the MMSc in the WAP PUSH.
The response from the MMSc contains the actual MMS itself, which is almost a mirror of the sending process (the Data component is unchanged from when the sender sent it).
Response to HTTP GET for message retrieval
At this stage our subscriber has retrieve the MMS, but may not have retrieved it fully, or may have had an issue retrieving it.
Instead the UE sends an HTTP POST with the MMS-Message-Type m-notifyresp-ind with the transaction ID, to indicate that it has successfully retrieved the MMS, and at this point the MMS can notify the sender if delivery receipts are enabled, and delete the message from the cache.
And finally the MMSc sends back a 200 OK with no body to confirm it got that too.
Some notes on MMS Security
Reading about unauthenticated GET requests, you may be left wondering what security does MMS have, and what stops you from just going through and sending HTTP GET requests to all the possible URL paths to vacuum up all the MMS?
In the standard, nothing!
Typically the MMSc has some layer of security added by the implementer, to ensure the user retrieving the MMS, is the user the MMS is destined for. Because MMS has no security in the standard, this is typically achieved through Header Enrichment, whereby the P-GW adds a HTTP header with the MSISDN or IMSI of the subscriber, and then the MMSc can evaluate if this subscriber should be able to retrieve that URL.
Another attack vector I played with was sending a SMS based MMS-Notify with a different URL, which if retrieved, would leak the subscriber’s IP, as it would cause the UE to try and get data from that URL.
Recently I had a strange issue I thought I’d share.
Using Kamailio as an Interrogating-CSCF, Kamailio was getting the S-CSCF details from the User-Authorization-Answer’s “Server-Name” (602) AVP.
The value was set to:
sip:scscf.mnc001.mcc001.3gppnetwork.org:5060
But the I-CSCF was only looking up A-Records for scscf.mnc001.mcc001.3gppnetwork.org, not using DNS-SRV.
The problem? The Server-Name I had configured as a full SIP URI in PyHSS including the port, meant that Kamailio only looks up the A-Record, and did not do a DNS-SRV lookup for the domain.
Dropping the port number saw all those delicious SRV records being queried.
Something to keep in mind if you use S-CSCF pooling with a Kamailio based I-CSCF, if you want to use SRV records for load balancing / traffic sharing, don’t include the port, and if instead you want it to go to the specified host found by an A-record, include the port.
So you want to send a Multimedia Message (Aka MMS or MM)?
Let’s do it – We’ll use the MM1 interface from the UE towards the MMSc (MMS Service Center) to send our Mobile Originated MMS.
Transport & Creation
Out of the box, our UE doesn’t get told by the network anything about where to send MMS messages (Unless set via something like Android’s Carrier Settings). Instead, this is typically configured by the user in the APN settings, by setting the MMSc address (Typically an FQDN), port (Typically 80) and often a Proxy (Which will actually handle the traffic). Lastly under the bearer type, if we’re sending the MMS on the default bearer (the one used for general Internet) then the bearer type will need to change from “default” to “default,mms”. Alternately, if you’re using a dedicated APN for MMS, you’ll need to set the bearer type to “mms”.
With the connectivity side setup, we’ll need to actually generate an MMS to send, something that is encapsulated in an MMS – so a picture is a good start.
We compose a message with this photo, put an address in the message and hit send on the UE.
The UE encapsulates the photo and metadata, such as the To address, into an HTTP POST is sent to the IP & Port of your MMSc (Or proxy if you have that set). The body of this HTTP POST contains the MMS Message Body (In this case our picture).
Our MMSc receives this POST, extracts the headers of interest, and the multimedia message body itself (in our case the photo) ready to be forwarded onto their destination.
PCAP Extract showing MMS m-send-req from UE
Header Enrichment / Charging / Authentication
One thing to note is that the From header is empty.
Often times a UE doesn’t know it’s own MSISDN. While there is an MSISDN EF on the SIM file system, often this is not updated with the correct MSISDN, as a customer may have ported over their number from a different carrier, or had a replacement SIM reissued. There’s also some problems in just trusting the From address set by a UE, without verifying it as anyone could change this.
The MMS standards evolved in parallel to the 3GPP specifications, but were historically specified by the Open Mobile Alliance. Because it is at arms length with 3GPP, SIM based authentication was not used on the MM1 interface from a UE to the MMSc.
In fact, there is no authentication on an MMS specified in the standard, meaning in theory, anyone could send one. To counter this, the P-GW or GGSN handling the subscriber traffic often enables “Header Enrichment” which when it detects traffic on the MMS APN, will add a Header to the Mobile Originated request with the IMSI or MSISDN of the subscriber sending it, which the MMSc can use to bill the customer.
m-send-req Request
Let’s take a closer look into the HTTP POST sent by the UE containing the message.
Firstly we have what looks like a pretty bulk-standard HTTP POST header, albeit with some custom headers prefixed with “X-” and the Content-Type is application/vnd.wap.mms-message.
But immediately after the HTTP header in the HTTP message body, we have the “MMS Message Encapsulation” header:
MMS Message Encapsulation Header from MO MMS
This header contains the Destination we set in the MMS when sending it, the request type (m-send-req) as well as the actual content itself (inside the Data section).
So why the double header? Why not just encapsulate the whole thing in the HTTP Post? When MMS was introduced, most phones didn’t have a HTTP stack baked into them like everything does now. Instead traffic would be going through a WAP Gateway.
When usage of WAP fell away, the standard moved to transport the same payload that was transfered over WAP, to instead be transferred over HTTP.
Inside the Data section we can see the MIME Type of the attachments themselves, in this case, it’s a photo of my desk:
With all this information, the MMSc analyses the headers and stores the message body ready for forwarding onto the recipient(/s).
m-send-conf Response
To confirm successful receipt, the MMSc sends back a 200 OK with a matching Transaction ID, so the UE knows the message was accepted.
I’ve recently been writing a lot about SS7 / Sigtran, and couldn’t fit this in anywhere, but figured it may be of use to someone…
In our 3-8-3 formated ITU International Point code, each of the parts have a unique meaning.
The 3 bits in the first section are called the Zone section. Being only 3 bits long it means we can only encode the numbers 0-7 on them, but ITU have broken the planet up into different “zones”, so the first part of our ITU International Point Code denotes which Zone the Point Code is in (as allocated by ITU).
The next 8 bits in the second section (Area section) are used to define the “Signaling Area Network Code” (SANC), which denotes which country a point code is located in. Values can range from 0-255 and many countries span multiple SANC zones, for example the USA has 58 SANC Zones.
Lastly we have the last 3 bits that make up the ID section, denoting a single unique point code, typically a carrier’s international gateway. It’s unique within a Zone & SANC, so combined with the Zone-SANC-ID makes it a unique address on the SS7 network. Being only 3 bits long means that we’ve only got 8 possible values, hence so many SANCs being used.
2
Europe
3
Greenland, North America, the Caribbean, and Mexico
This is part of a series of posts looking into SS7 and Sigtran networks. We cover some basic theory and then get into the weeds with GNS3 based labs where we will build real SS7/Sigtran based networks and use them to carry traffic.
Having a direct Linkset from every Point Code to every other Point Code in an SS7 network isn’t practical, we need to rely on routing, so in this post we’ll cover routing between Point Codes on our STPs.
Let’s start in the IP world, imagine a router with a routing table that looks something like this:
Simple IP Routing Table
192.168.0.0/24 out 192.168.0.1 (Directly Attached)
172.16.8.0/22 via 192.168.0.3 - Static Route - (Priority 100)
172.16.0.0/16 via 192.168.0.2 - Static Route - (Priority 50)
10.98.22.1/32 via 192.168.0.3 - Static Route - (Priority 50)
We have an implicit route for the network we’re directly attached to (192.168.0.0/24), and then a series of static routes we configure. We’ve also got two routes to the 172.16.8.0/22 subnet, one is more specific with a higher priority (172.16.8.0/22 – Priority 100), while the other is less specific with a lower priority (172.16.0.0/16 – Priority 50). The higher priority route will take precedence.
This should look pretty familiar to you, but now we’re going to take a look at routing in SS7, and for that we’re going to be talking Variable Length Subnet Masking in detail you haven’t needed to think about since doing your CCNA years ago…
Why Masking is Important
A route to a single Point Code is called a “/14”, this is akin to a single IPv4 address being called a “/32”.
We could setup all our routing tables with static routes to each point code (/14), but with about 4,000 international point codes, this might be a challenge.
Instead, by using Masks, we can group together ranges of Point Codes and route those ranges through a particular STP.
This opens up the ability to achieve things like “Route all traffic to Point Codes to this Default Gateway STP”, or to say “Route all traffic to this region through this STP”.
Individually routing to a point code works well for small scale networking, but there’s power, flexibility and simplification that comes from grouping together ranges of point codes.
Information Overload about Point Codes
So far we’ve talked about point codes in the X.YYY.Z format, in our lab we setup point codes like 1.2.3.
This is not the only option however…
Variants of SS7 Point Codes
IPv4 addresses look the same regardless of where you are. From Algeria to Zimbabwe, IPv4 addresses look the same and route the same.
In SS7 networks that’s not the case – There are a lot of variants that define how a point code is structured, how long it is, etc. Common variants are ANSI, ITU-T (International & National variants), ETSI, Japan NTT, TTC & China.
The SS7 variant used must match on both ends of a link; this means an SS7 node speaking ETSI flavoured Point Codes can’t exchange messages with an ANSI flavoured Point Code.
Well, you can kinda translate from one variant to another, but requires some rewriting not unlike how NAT does it.
ITU International Variant
For the start of this series, we’ll be working with the ITU International variant / flavour of Point Code.
ITU International point codes are 14 bits long, and format is described as 3-8-3. The 3-8-3 form of Point code just means the 14 bit long point code is broken up into three sections, the first section is made up of the first 3 bits, the second section is made up of the next 8 bits then the remaining 3 bits in the last section, for a total of 14 bits.
So our 14 bit 3-8-3 Point Code looks like this in binary form:
If you’re dealing with multiple vendors or products,you’ll see some SS7 Point Codes represented as decimal (2067), some showing as 1-2-3 codes and sometimes just raw binary. Fun hey?
So why does the binary part matter? Well the answer is for masks.
To loop back to the start of this post, we talked about IP routing using a network address and netmask, to represent a range of IP addresses. We can do the same for SS7 Point Codes, but that requires a teeny bit of working out.
As an example let’s imagine we need to setup a route to all point codes from 3-4-0 through to 3-6-7, without specifying all the individual point codes between them.
Firstly let’s look at our start and end point codes in binary:
100-00000100-000 = 3-004-0 (Start Point Code)
100-00000110-111 = 3-006-7 (End Point Code)
Looking at the above example let’s look at how many bits are common between the two,
100-00000100-000 = 3-004-0 (Start Point Code)
100-00000110-111 = 3-006-7 (End Point Code)
The first 9 bits are common, it’s only the last 5 bits that change, so we can group all these together by saying we have a /9 mask.
When it comes time to add a route, we can add a route to 3-4-0/9 and that tells our STP to match everything from point code 3-4-0 through to point code 3-6-7.
The STP doing the routing it only needs to match on the first 9 bits in the point code, to match this route.
SS7 Routing Tables
Now we have covered Masking for roues, we can start putting some routes into our network.
In order to get a message from one point code to another point code, where there isn’t a direct linkset between the two, we need to rely on routing, which is performed by our STPs.
This is where all that point code mask stuff we just covered comes in.
Let’s look at a diagram below,
Let’s look at the routing to get a message from Exchange A (SSP) on the bottom left of the picture to Exchange E (SSP) with Point Code 4.5.3 in the bottom right of the picture.
Exchange A (SSP) on the bottom left of the picture has point code 1.2.3 assigned to it and a Linkset to STP-A. It has the implicit route to STP-A as it’s got that linkset, but it’s also got a route configured on it to reach any other point code via the Linkset to STP-A via the 0.0.0/0 route which is the SS7 equivalent of a default route. This means any traffic to any point code will go to STP-A.
From STP-A we have a linkset to STP-B. In order to route to the point codes behind STP-B, STP-A has a route to match any Point Code starting with 4.5.X, which is 4.5.0/11. This means that STP-A will route any Point Code between 4.5.1 and 4.5.7 down the Linkset to STP-B.
STP-B has got a direct connection to Exchange B and Exchange E, so has implicit routes to reach each of them.
So with that routing table, Exchange A should be able to route a message to Exchange E.
But…
Return Routing
Just like in IP routing, we need return routing. while Exchange A (SSP) at 1.2.3 has a route to everywhere in the network, the other parts of the network don’t have a route to get to it. This means a request from 1.2.3 can get anywhere in the network, but it can’t get a response back to 1.2.3.
So to get traffic back to Exchange A (SSP) at 1.2.3, our two Exchanges on the right (Exchange B & C with point codes 4.5.6 and 4.5.3) will need routes added to them. We’ll also need to add routes to STP-B, and once we’ve done that, we should be able to get from Exchange A to any point code in this network.
There is a route missing here, see if you can pick up what it is!
So we’ve added a default route via STP-B on Exchange B & Exchange E, and added a route on STP-B to send anything to 1.2.3/14 via STP-A, and with that we should be able to route from any exchange to any other exchange.
One last point on terminology – when we specify a route we don’t talk in terms of the next hop Point Code, but the Linkset to route it down. For example the default route on Exchange A is 0.0.0/0 via STP-A linkset (The linkset from Exchange A to STP-A), we don’t specify the point code of STP-A, but just the name of the Linkset between them.
Back into the Lab
So back to the lab, where we left it was with linksets between each point code, so each Country could talk to it’s neighbor.
Let’s confirm this is the case before we go setting up routes, then together, we’ll get a route from Country A to Country C (and back).
So let’s check the status of the link from Country B to its two neighbors – Country A and Country C. All going well it should look like this, and if it doesn’t, then stop by my last post and check you’ve got everything setup.
So let’s add some routing so Country A can reach Country C via Country B. On Country A STP we’ll need to add a static route. For this example we’ll add a route to 7.7.1/14 (Just Country C).
That means Country A knows how to get to Country C. But with no return routing, Country C doesn’t know how to get to Country A. So let’s fix that.
We’ll add a static route to Country C to send everything via Country B.
CountryC#conf t
Enter configuration commands, one per line. End with CNTL/Z.
CountryC(config)#cs7 route-table system
CountryC(config)#update route 0.0.0/0 linkset ToCountryB
*Jan 01 05:37:28.879: %CS7MTP3-5-DESTSTATUS: Destination 0.0.0 is accessible
So now from Country C, let’s see if we can ping Country A (Ok, it’s not a “real” ICMP ping, it’s a link state check message, but the result is essentially the same).
By running:
CountryC# ping cs7 1.2.3
*Jan 01 06:28:53.699: %CS7PING-6-RTT: Test Q.755 1.2.3: MTP Traffic test rtt 48/48/48
*Jan 01 06:28:53.699: %CS7PING-6-STAT: Test Q.755 1.2.3: MTP Traffic test 100% successful packets(1/1)
*Jan 01 06:28:53.699: %CS7PING-6-RATES: Test Q.755 1.2.3: Receive rate(pps:kbps) 1:0 Sent rate(pps:kbps) 1:0
*Jan 01 06:28:53.699: %CS7PING-6-TERM: Test Q.755 1.2.3: MTP Traffic test terminated.
We can confirm now that Country C can reach Country A, we can do the same from Country A to confirm we can reach Country B.
But what about Country D? The route we added on Country A won’t cover Country D, and to get to Country D, again we go through Country B.
This means we could group Country C and Country D into one route entry on Country A that matches anything starting with 7-X-X,
For this we’d add a route on Country A, and then remove the original route;
Of course, you may have already picked up, we’ll need to add a return route to Country D, so that it has a default route pointing all traffic to STP-B. Once we’ve done that from Country A we should be able to reach all the other countries:
CountryA#show cs7 route
Dynamic Routes 0 of 1000
Routing table = system Destinations = 3 Routes = 3
Destination Prio Linkset Name Route
---------------------- ---- ------------------- -------
4.5.6/14 acces 1 ToCountryB avail
7.0.0/3 acces 5 ToCountryB avail
CountryA#ping cs7 7.8.1
*Jan 01 07:28:19.503: %CS7PING-6-RTT: Test Q.755 7.8.1: MTP Traffic test rtt 84/84/84
*Jan 01 07:28:19.503: %CS7PING-6-STAT: Test Q.755 7.8.1: MTP Traffic test 100% successful packets(1/1)
*Jan 01 07:28:19.503: %CS7PING-6-RATES: Test Q.755 7.8.1: Receive rate(pps:kbps) 1:0 Sent rate(pps:kbps) 1:0
*Jan 01 07:28:19.507: %CS7PING-6-TERM: Test Q.755 7.8.1: MTP Traffic test terminated.
CountryA#ping cs7 7.7.1
*Jan 01 07:28:26.839: %CS7PING-6-RTT: Test Q.755 7.7.1: MTP Traffic test rtt 60/60/60
*Jan 01 07:28:26.839: %CS7PING-6-STAT: Test Q.755 7.7.1: MTP Traffic test 100% successful packets(1/1)
*Jan 01 07:28:26.839: %CS7PING-6-RATES: Test Q.755 7.7.1: Receive rate(pps:kbps) 1:0 Sent rate(pps:kbps) 1:0
*Jan 01 07:28:26.843: %CS7PING-6-TERM: Test Q.755 7.7.1: MTP Traffic test terminated.
So where to from here?
Well, we now have a a functional SS7 network made up of STPs, with routing between them, but if we go back to our SS7 network overview diagram from before, you’ll notice there’s something missing from our lab network…
So far our network is made up only of STPs, that’s like building a network only out of routers!
In our next lab, we’ll start adding some SSPs to actually generate some SS7 traffic on the network, rather than just OAM traffic.
This is part of a series of posts looking into SS7 and Sigtran networks. We cover some basic theory and then get into the weeds with GNS3 based labs where we will build real SS7/Sigtran based networks and use them to carry traffic.
So we’ve made it through the first two parts of this series talking about how it all works, but now dear reader, we build an SS7 Lab!
At one point, and SS7 Signaling Transfer Point would be made up of at least 3 full size racks, and cost $5M USD. We can run a dozen of them inside GNS3!
Cisco’s “IP Transfer Point” (ITP) software adds SS7 STP functionality to some models of Cisco Router, like the 2651XM and C7200 series hardware.
Luckily for us, these hardware platforms can be emulated in GNS3, so that’s how we’ll be setting up our instances of Cisco’s ITP product to use as STPs in our network.
For the rest of this post series, I’ll refer to Cisco’s IP Transfer Point as the “Cisco STP”.
Not open source you say! Osmocom have OsmoSTP, which we’ll introduce in a future post, and elaborate on why later…
From inside GNS3, we’ll create a new template as per the Gif below.
You will need a copy of the software image to load in. If you’ve got software entitlements you should be able to download it, the filename of the image I’m using for the 7200 series is c7200-itpk9-mz.124-15.SW.bin and if you go searching, you should find it.
Now we can start building networks with our Cisco STPs!
What we’re going to achieve
In this lab we’re going to introduce the basics of setting up STPs using Sigtran (SS7 over IP).
If you follow along, by the end of this post you should have two STPs talking Sigtran based SS7 to each other, and be able to see the SS7 packets in Wireshark.
As we touched on in the last post, there’s a lot of different flavours and ways to implement SS7 over IP. For this post, we’re going to use M2PA (MTP2 Peer Adaptation Layer) to carry the MTP2 signaling, while MTP3 and higher will look the same as if it were on a TDM link. In a future post we’ll better detail the options here, the strengths and weaknesses of each method of transporting SS7 over IP, but that’s future us’ problem.
IP Connectivity
As we don’t have any TDM links, we’re going to do everything on IP, this means we have to setup the IP layer, before we can add any SS7/Sigtran stuff on top, so we’re going to need to get basic IP connectivity going between our Cisco STPs.
So for this we’ll need to set an IP Address on an interface, unshut it, link the two STPs. Once we’ve confirmed that we’ve got IP connectivity running between the two, we can get started on the Sigtran / SS7 side of things.
Let’s face it, if you’re reading this, I’m going to bet that you are probably aware of how to configure a router interface.
I’ve put a simple template down in the background to make a little more sense, which I’ve attached here if you want to follow along with the same addressing, etc.
So we’ll configure all the routers in each country with an IP – we don’t need to configure IP routing. This means adjacent countries with a direct connection between them should be able to ping each other, but separated countries shouldn’t be able to.
So now we’ve got IP connectivity between two countries, let’s get Sigtran / SS7 setup!
First we’ll need to define the basics, from configure-terminal in each of the Cisco STPs. We’ll need to set the SS7 variant (We’ll use ITU variant as we’re simulating international links), the network-indicator (This is an International network, so we’ll use that) and the point code for this STP (From the background image).
CountryA(config)#cs7 variant itu
CountryA(config)#cs7 network-indicator international
CountryA(config)#cs7 point-code 1.2.3
Repeat this step on Country A and Country B.
Next we’ll define a local peer on the STP. This is an instance of the Sigtran stack along with the port we’ll be listening on. Our remote peer will need to know this value to bring up the connection, the number specified is the port, and the IP is the IP it will bind on.
If you’re still sniffing the traffic between Country A and Country B, you should see our SS7 connection come up.
Wireshark trace of the connection coming up
The conneciton will come up layer-by-layer, firstly you’ll see the transport layer (SCTP) bring up an SCTP association, then MTP2 Peer Adaptation Layer (M2PA) will negotiate up to confirm both ends are working, then finally you’ll see MTP3 messaging.
If we open up an MTP3 packet you can see our Originating and Destination Point Codes.
Notice in Wireshark the Point Codes don’t show up as 1-2-3, but rather 2067? That’s because they’re formatted as Decimal rather than 14 bit, this handy converter will translate them for you, or you can just change your preference in Wireshark’s decoders to use the matching ITU POint Code Structure.
From the CLI on one of the two country STPs we can run some basic commands to view the status of all SS7 components and Linksets.
And there you have it! Basic SS7 connectivity!
There is so much more to learn, and so much more to do! By bringing up the link we’ve barely scratched the surface here.
Some homework before the next post, link all the other countries shown together, with Country D having a link to Country C and Country B. That’s where we’ll start in the lab – Tip: You’ll find you’ll need to configure a new cs7 local-peer for each interface, as each has its own IP.
This is part of a series of posts looking into SS7 and Sigtran networks. We cover some basic theory and then get into the weeds with GNS3 based labs where we will build real SS7/Sigtran based networks and use them to carry traffic.
So one more step before we actually start bringing up SS7 / Sigtran networks, and that’s to get a bit of a closer look at what components make up SS7 networks.
Recap: What is SS7?
SS7 is the name given to the protocol stack used almost exclusively in the telecommunications space. SS7 isn’t just one protocol, instead it is a suite of protocols. In the same way when someone talks about IP networking, they’re typically not just talking about the IP layer, but the whole stack from transport to application, when we talk about an SS7 network, we’re talking about the whole stack used to carry messages over SS7.
And what is SIGTRAN?
Sigtran is “Signaling Transport”. Historically SS7 was carried over TDM links (Like E1 lines).
As the internet took hold, the “Signaling Transport” working group was formed to put together the standards for carrying SS7 over IP, and the name stuck.
I’ve always thought if I were to become a Mexican Wrestler (which is quite unlikely), my stage name would be DSLAM, but SIGTRAN comes a close second.
Today when people talk about SIGTRAN, they mean “SS7 over IP”.
What is in an SS7 Network?
SS7 Networks only have 3 types of network elements:
Service Switching Points (SSP)
Service Transfer Points (STP)
Service Control Points (SCP)
Service Switching Points (SSP)
Service Switching Points (SSPs) are endpoints in the network. They’re the users of the connectivity, they use it to create and send meaningful messages over the SS7 network, and receive and process messages over the SS7 network.
Like a PC or server are IP endpoints on an IP Network, which send and receive messages over the network, an SSP uses the SS7 network to send and receive messages.
In a PSTN context, your local telephone exchange is most likely an SS7 Service Switching Point (SSP) as it creates traffic on the SS7 network and receives traffic from it.
A call from a user on one exchange to a user on another exchange could go from the SSP in Exchange A, to the SSP in Exchange B, in the same way you could send data between two computers by connecting directly between them with an Ethernet crossover cable.
Messages between our two exchanges are addressed using Point Codes, which can be thought of a lot like IP Addresses, except shorter.
In the MTP3 header of each SS7 message is the Destination Point Code, and the Origin Point Code.
When Telephone Exchange A wants to send a message over SS7 to Telephone Exchange B, the MTP header would look like:
MTP3 Header:
Origin Point Code: 1.2.3
Destination Point Code: 4.5.6
Service Transfer Points (STP)
Linking each SSP to each other SSP has a pretty obvious problem as our network grows.
What happens if we’ve got hundreds of SSPs? If we want a full-mesh topology connecting every SSP to every other SSP directly, we’d have a rats nest of links!
A “full-mesh” approach for connecting SSPs does not work at scale, so STPs are introduced
So to keep things clean and scalable, we’ve got Signalling Transfer Points (STPs).
STPs can be thought of like Routers but in an SS7 network.
When our SSP generates an SS7 message, it’s typically handed to an STP which looks at the Destination Point Code, it’s own routing table and routes it off to where it needs to go.
STP acting as a central router to connect lots of SSPs
This means every SSP doesn’t require a connection to every other SSP. Instead by using STPs we can cut down on the complexity of our network.
When Telephone Exchange A wants to send a message over SS7 to Telephone Exchange B, the MTP header would look the same, but the routing table on Telephone Exchange A would be setup to send the requests out the link towards the STP.
MTP3 Header:
Origin Point Code: 1.2.3
Destination Point Code: 4.5.6
Linksets
Between SS7 Nodes we have Linksets. Think of Linksets as like LACP or Etherchannel, but for SS7.
You want to have multiple links on every connection, for sharing out the load or for redundancy, and a Linkset is a group of connections from one SS7 node to another, that are logically treated as one link.
Link between an SSP and STP with 3 linksets
Each of the links in a Linkset is identified by a number, and specified in in the MTP3 header’s “Signaling Link Selector” field, so we know what link each message used.
MTP3 Header:
Origin Point Code: 1.2.3
Destination Point Code: 4.5.6
Signaling Link Selector: 2
Service Control Point (SCP)
Somewhere between a Rolodex an relational database, is the Service Control Point (SCP).
For an exchange (SSP) to route a call to another exchange, it has to know the point code of the destination Exchange to send the call to. When fixed line networks were first deployed this was fairly straight forward, each exchange had a list of telephone number prefixes and the point code that served each prefix, simple.
But then services like number porting came along when a number could be moved anywhere. Then 1800/0800 numbers where a number had to be translated back to a standard phone number entered the picture.
To deal with this we need a database, somewhere an SSP can go to query some information in a database and get a response back.
This is where we use the Service Control Point (SCP).
Keep in mind that SS7 long predates APIs to easily lookup data from a service, so there was no RESTful option available in the 1980s.
When a caller on a local exchange calls a toll free (1800 or 0800 number depending on where you are) number, the exchange is setup with the Point Code of an SCP to query with the toll free number, and the SCP responds back with the local number to route the call to.
While SCPs are fading away in favor of technology like DNS/ENUM for Local Number Portability or Routing Databases, but they are still widely used in some networks.
Getting to know the Signalling Transfer Point (STP)
As we saw earlier, instead of a one-to-one connection between each SS7 device to every other SS7 device, Signaling Transfer Points (STP) are used, which act like routers for our SS7 traffic.
The STP has an internal routing table made up of the Point Codes it has connections to and some logic to know how to get to each of them.
Like a router, STPs don’t really create SS7 traffic, or consume traffic, they just receive SS7 messages and route them on towards their destination.
Ok, they do create some traffic for checking links are up, etc, but like a router, their main job is getting traffic where it needs to go.
When an STP receives an SS7 message, the STP looks at the MTP3 header. Specifically the Destination Point Code, and finds if it has a path to that Point Code. If it has a route, it forwards the SS7 message on to the next hop.
Like a router, an STP doesn’t really concern itself with anything higher than the MTP3 layer – As point codes are set in the MTP3 layer that’s the only layer the STP looks at and the upper layers aren’t really “any of its business”.
STPs don’t require a direct connection (Linkset) from the Originating Point Code straight to the Destination Point Code. Just like every IP router doesn’t need a direct connection to ever other network. By setting up a routing table of Point Codes and Linksets as the “next-hop”, we can reach Destination Point Codes we don’t have a direct Linkset to by routing between STPs to reach the final Destination Point Code.
Let’s work through an example:
And let’s look at the routing table setup on STP-A:
STP A Routing Table:
1.2.3 - Directly attached (Telephone Exchange A)
1.2.4 - Directly attached (Telephone Exchange C)
1.2.5 - Directly attached (Telephone Exchange D)
4.5.1 - Directly attached (STP-B)
4.5.3 - Via STP-B
4.5.6 - Via STP-B
So what happens when Telephone Exchange A (Point Code 1.2.3) wants to send a message to Telephone Exchange E (Point Code 4.5.3)? Firstly Telephone Exchange A puts it’s message on an MTP3 payload, and the MTP3 header will look something like this:
MTP3 Header:
Origin Point Code: 1.2.3
Destination Point Code: 4.5.3
Signaling Link Selector: 1
Telephone Exchange A sends the SS7 message to STP A, which looks at the MTP3 header’s Destination Point Code (4.5.3), and then in it’s routing table for a route to the destination point. We can see from our example routing table that STP A has a route to Destination Point Code 4.5.3 via STP-B, so sends it onto STP-B.
For STP-B it has a direct connection (linkset) to Telephone Exchange E (Point Code 4.5.3), so sends it straight on
Like IP, Point Codes have their own form of Variable-Length-Subnet-Routing which means each STP doesn’t need full routing info for every Destination Point Code, but instead can have routes based on part of the point code and a subnet mask.
But unlike IP, there is no BGP or OSPF on SS7 networks. Instead, all routes have to be manually specified.
For STP A to know it can get messages to destinations starting with 4.5.x via STP B, it needs to have this information manually added to it’s route table, and the same for the return routing.
Sigtran & SS7 Over IP
As the world moved towards IP enabled everything, TDM based Sigtran Networks became increasingly expensive to maintain and operate, so a IETF taskforce called SIGTRAN (Signaling Transport) was created to look at ways to move SS7 traffic to IP.
When moving SS7 onto IP, the first layer of SS7 (MTP1) was dropped, as it primarily concerned the physical side of the network. MTP2 didn’t really fit onto an IP model, so a two options were introduced for transport of the MTP2 data, M2PA (Message Transfer Part 2 User Peer-to-Peer Adaptation Layer) and M2UA (MTP2 User Adaptation Layer) were introduced, which rides on top of SCTP. This means if you wanted an MTP2 layer over IP, you could use M2UA or M2TP.
SCTP is neither TCP or UDP. I’ve touched upon SCTP on this blog before, it’s as if you took the best bits of TCP without the issues like head of line blocking and added multi-homing of connections.
So if you thought all the layers above MTP2 are just transferred, unchanged on top of our M2PA layer, that’s one way of doing it, however it’s not the only way of doing it.
There are quite a few ways to map SS7 onto IP Networks, which we’ll start to look into it more detail, but to keep it simple, for the next few posts we’ll be assuming that everything above MTP2/M2PA remain unchanged.
In the next post, we’ll get some actual SS7 traffic flowing!
This is part of a series of posts looking into SS7 and Sigtran networks. We cover some basic theory and then get into the weeds with GNS3 based labs where we will build real SS7/Sigtran based networks and use them to carry traffic.
If you use a mobile phone, a VoIP system or a copper POTS line, there’s a high chance that somewhere in the background, SS7 based signaling is being used.
The signaling for GSM, UMTS and WCDMA mobile networks all rely on SS7 based signaling, and even today the backbone of most PSTN traffic relies SS7 networks. To many this is mysterious carrier tech, and as such doesn’t get much attention, but throughout this series of posts we’ll take a hands-on approach to putting together an SS7 network using GNS3 based labs and connect devices through SS7 and make some stuff happen.
Overview of SS7
Signaling System No. 7 (SS7/C7) is the name for a family of protocols originally designed for signaling between telephone switches. In plain English, this means it was used to setup and teardown large volumes of calls, between exchanges or carriers.
When carrier A and Carrier B want to send calls between each other, there’s a good chance they’re doing it over an SS7 Network.
But wait! SIP exists and is very popular, why doesn’t everyone just use SIP? Good question, imaginary asker. The answer is that when SS7 came along, SIP was still almost 25 years away from being defined. Yes. It’s pretty old.
SS7 isn’t one protocol, but a family of protocols that all work together – A “protocol stack”. The SS7 specs define the lower layers and a choice of upper layer / application protocols that can be carried by them.
The layered architecture means that the application layer at the top can be changed, while the underlying layers are essentially the same.
This means while SS7’s original use was for setting up and tearing down phone calls, this is only one application for SS7 based networks. Today SS7 is used heavily in 2G/3G mobile networks for connectivity between core network elements in the circuit-switched domain, for international roaming between carriers and services like Local Number Portability and Toll Free numbers.
Here’s the layers of SS7 loosely mapped onto the OSI model (SS7 predates the OSI model as well):
OSI Model (Left) and SS7 Protocol Stack (Right)
We do have a few layers to play with here, and we’ll get into them all in depth as we go along, but a brief introduction to the underlying layers:
MTP 1 – Message Transfer Part 1
This is our physical layer. In this past this was commonly E1/T1 lines.
It’s responsible for getting our 1s and 0s from one place to another.
MTP 2 – Message Transfer Part 2
MTP2 is responsible for the data link layer, handling reliable transfer of data, in sequence.
MTP 3 – Message Transfer Part 3
The MTP3 header contains an Originating and a Destination Point Code.
These point codes can be thought of as like an IP Address; they’re used to address the source and destination of a message. A “Point Code” is the unique address of a SS7 Network element.
MTP3 header showing the Destination Point Code (DPC) and Origin Point Code (OPC) on a National Network, carrying ISUP traffic
Every message sent over an SS7 network will contain an Origin Point Code that identifies the sender, and a Destination Point Code that identifies the intended recipient.
This is where we’ll bash around at the start of this course, setting up Linksets to allow different devices talking to each other and addressing each other via Point Codes.
The MTP3 header also has a Service Indicator flag that indicates what the upper layer protocol it is carrying is, like the Protocol indicator in IPv4/IPv6 headers.
A Signaling Link Selector indicates which link it was transported over (did I mention we can join multiple links together?), and a Network Indicator for determining if this is signaling is at the National or International level.
TUP/MAP/SCCP/ISUP
These are the “higher-layer” protocols. Like FTP sits on top of TCP/IP, a SS7 network can transport these protocols from their source to their destination, as identified by the Origin Point Code (OPC), to the Destination Point Code (DPC), as specified in the MTP3 header.
We’ll touch on these protocols more as we go on. SCCP has it’s own addressing on top of the OPC/DPC (Like IP has IP Addressing, but TCP has port numbers on top to further differentiate).
Why learn SS7 today?
SS7 and SIGTRAN are still widely in use in the telco world, some of it directly, other parts derived / evolved from it.
So stick around, things are about to get interesting!
So I’ve been waxing lyrical about how cool in the NRF is, but what about how it’s secured?
A matchmaking service for service-consuming NFs to find service-producing NFs makes integration between them a doddle, but also opens up all sorts of attack vectors.
Theoretical Nasty Attacks (PoC or GTFO)
Sniffing Signaling Traffic: A malicious actor could register a fake UDR service with a higher priority with the NRF. This would mean UDR service consumers (Like the AUSF or UDM) would send everything to our fake UDR, which could then proxy all the requests to the real UDR which has a lower priority, all while sniffing all the traffic.
Stealing SIM Credentials: Brute forcing the SUPI/IMSI range on a UDR would allow the SIM Card Crypto values (K/OP/Private Keys) to be extracted.
Sniffing User Traffic: A dodgy SMF could select an attacker-controlled / run UPF to sniff all the user traffic that flows through it.
Obviously there’s a lot more scope for attack by putting nefarious data into the NRF, or querying it for data gathering, and I’ll see if I can put together some examples in the future, but you get the idea of the mischief that could be managed through the NRF.
This means it’s pretty important to secure it.
OAuth2
3GPP selected to use common industry standards for HTTP Auth, including OAuth2 (Clearly lessons were learned from COMP128 all those years ago), however OAuth2 is optional, and not integrated as you might expect. There’s a little bit to it, but you can expect to see a post on the topic in the next few weeks.
3GPP Security Recommendations
So how do we secure the NRF from bad actors?
Well, there’s 3 options according to 3GPP:
Option 1 – Mutual TLS
Where the Client (NF) and the Server (NRF) share the same TLS info to communicate.
This is a pretty standard mechanism to use for securing communications, but the reliance on issuing certificates and distributing them is often done poorly and there is no way to ensure the person with the certificate, is the person the certificate was issued to.
3GPP have not specified a mechanism for issuing and securely distributing certificates to NFs.
Option 2 – Network Domain Security (NDS)
Split the network traffic on a logical level (VLANs / VRFs, etc) so only NFs can access the NRF.
Essentially it’s logical network segregation.
Option 3 – Physical Security
Split the network like in NDS but a physical layer, so the physical cables essentially run point-to-point from NF to NRF.
NRF and NF shall authenticate each other during discovery, registration, and access token request. If the PLMN uses protection at the transport layer as described in clause 13.1, authentication provided by the transport layer protection solution shall be used for mutual authentication of the NRF and NF. If the PLMN does not use protection at the transport layer, mutual authentication of NRF and NF may be implicit by NDS/IP or physical security (see clause 13.1). When NRF receives message from unauthenticated NF, NRF shall support error handling, and may send back an error message. The same procedure shall be applied vice versa. After successful authentication between NRF and NF, the NRF shall decide whether the NF is authorized to perform discovery and registration. In the non-roaming scenario, the NRF authorizes the Nnrf_NFDiscovery_Request based on the profile of the expected NF/NF service and the type of the NF service consumer, as described in clause 4.17.4 of TS23.502 [8].In the roaming scenario, the NRF of the NF Service Provider shall authorize the Nnrf_NFDiscovery_Request based on the profile of the expected NF/NF Service, the type of the NF service consumer and the serving network ID. If the NRF finds NF service consumer is not allowed to discover the expected NF instances(s) as described in clause 4.17.4 of TS 23.502[8], NRF shall support error handling, and may send back an error message. NOTE 1: When a NF accesses any services (i.e. register, discover or request access token) provided by the NRF , the OAuth 2.0 access token for authorization between the NF and the NRF is not needed.
TS 133 501 – 13.3.1 Authentication and authorization between network functions and the NRF
The Network Repository Function plays matchmaker to all the elements in our 5G Core.
For our 5G Service-Based-Architecture (SBA) we use Service Based Interfaces (SBIs) to communicate between Network Functions. Sometimes a Network Function acts as a server for these interfaces (aka “Service Producer”) and sometimes it acts as a client on these interfaces (aka “Service Consumer”).
For service consumers to be able to find service producers (Clients to be able to find servers), we need a directory mechanism for clients to be able to find the servers to serve their needs, this is the role of the NRF.
With every Service Producer registering to the NRF, the NRF has knowledge of all the available Service Producers in the network, so when a Service Consumer NF comes along (Like an AMF looking for UDM), it just queries the NRF to get the details of who can serve it.
Basic Process – NRF Registration
In order to be found, a service producer NF has to register with the NRF, so the NRF has enough info on the service-producer to be able to recommend it to service-consumers.
This is all the basic info, the Service Based Interfaces (SBIs) that this NF serves, the PLMN, and the type of NF.
The NRF then stores this information in a database, ready to be found by SBI Service Consumers.
This is achieved by the Service Producing NF sending a HTTP2 PUT to the NRF, with the message body containing all the particulars about the services it offers.
Simplified example of an SMSc registering with the NRF in a 5G Core
Basic Process – NRF Discovery
With an NRF that has a few SBI Service Producers registered in it, we can now start querying it from SBI Service Consumers, to find SBI Service Producers.
The SBI Service Consumer looking for a SBI Service Producer, queries the NRF with a little information about itself, and the SBI Service Producer it’s looking for.
For example a SMF looking for a UDM, sends a request like:
There’s no such thing as a free lunch, and 5G is the same – services running through a 5G Standalone core need to be billed.
In 5G Core Networks, the SMF (Session Management Function) reaches out to the CHF (Charging Function) to perform online charging, via the Nchf_ConvergedCharging Service Based Interface (aka reference point).
Like in other generations of core mobile networks, Credit Control in 5G networks is based on 3 functions: Requesting a quota for a subscriber from an online charging service, which if granted permits the subscriber to use a certain number of units (in this case data transferred in/out). Just before those units are exhausted sending an update to request more units from the online charging service to allow the service to continue. When the session has ended or or subscriber has disconnected, a termination to inform the online charging service to stop billing and refund any unused credit / units (data).
Initial Service Creation (ConvergedCharging_Create)
When the SMF needs to setup a session, (For example when the AMF sends the SMF a Nsmf_PDU_SessionCreate request), the CTF (Charging Trigger Function) built into the SMF sends a Nchf_ ConvergedCharging_Create (Initial, Quota Requested) to the Charging Function (CHF).
Because the Nchf_ConvergedCharging interface is a Service Based Interface this is carried over HTTP, in practice, this means the SMF sends a HTTP post to http://yourchargingfunction/Nchf_ConvergedCharging/v1/chargingdata/
Obviously there’s some additional information to be shared rather than just a HTTP post, so the HTTP post includes the ChargingDataRequest as the Request Body. If you’ve dealt with Diameter Credit Control you may be expecting the ChargingDataRequest information to be a huge jumble of nested AVPs, but it’s actually a fairly short list:
The subscriberIdentifier (SUPI) is included to identify the subscriber so the CHF knows which subscriber to charge
The nfConsumerIdentification identifies the SMF generating the request (The SBI Consumer)
The invocationTimeStamp and invocationSequenceNumber are both pretty self explanatory; the time the request is sent and the sequence number from the SBI consumer
The notifyUri identifies which URI should receive subsequent notifications from the CHF (For example if the CHF wants to terminate the session, the SMF to send that to)
The multipleUnitUsage defines the service-specific parameters for the quota being requested.
The triggers identifies the events that trigger the request
Of those each of the fields should be pretty self explanatory as to their purpose. The multipleUnitUsage data is used like the Service Information AVP in Diameter based Credit Control, in that it defines the specifics of the service we’re requesting a quota for. Inside it contains a mandatory ratingGroup specifying which rating group the CHF should use, and optionally requestedUnit which can define either the amount of service units being requested (For us this is data in/out), or to tell the CHF units are needed. Typically this is used to define the amount of units to be requested.
On the amount of units requested we have a bit of a chicken-and-egg scenario; we don’t know how many units (In our case the units is transferred data in/out) to request, if we request too much we’ll take up all the customer’s credit, potentially prohibiting them from accessing other services, and not enough requested and we’ll constantly slam the CHF with requests for more credit. In practice this value is somewhere between the two, and will vary quite a bit.
Based on the service details the SMF has put in the Nchf_ ConvergedCharging_Create request, the Charging Function (CHF) takes into account the subscriber’s current balance, credit control policies, etc, and uses this to determine if the Subscriber has the required balances to be granted a service, and if so, sends back a 201 CREATED response back to the Nchf_ConvergedCharging_Create request sent by the CTF inside the SMF.
This 201 CREATED response is again fairly clean and simple, the key information is in the multipleQuotaInformation which is nested within the ChargingDataResponse, which contains the finalUnitIndication defining the maximum units to be granted for the session, and the triggers to define when to check in with CHF again, for time, volume and quota thresholds.
And with that, the service is granted, the SMF can instruct the UPF to start allowing traffic through.
Update (ConvergedCharging_Update)
Once the granted units / quota has been exhausted, the Update (ConvergedCharging_Update) request is used for requesting subsequent usage / quota units. For example our Subscriber has used up all the data initially allocated but is still consuming data, so the SMF sends a Nchf_ConvergedCharging_Update request to request more units, via another HTTP post, to the CHF, with the requested service unit in the request body in the form of ChargingDataRequest as we saw in the initial ConvergedCharging_Create.
If the subscriber still has credit and the CHF is OK to allow their service to continue, the CHF returns a 200 OK with the ChargingDataResponse, again, detailing the units to be granted.
This procedure repeats over and over as the subscriber uses their allocated units.
Release (ConvergedCharging_Release)
Eventually when our subscriber disconnects, the SMF will generate a Nchf_ConvergedCharging_Release request, detailing the data the subscriber used in the ChargingDataRequest in the body, to the CHF, so it can refund any unused credits.
The CHF sends back a 204 No Content response, and the procedure is completed.
More Info
If you’ve had experience in Diameter credit control, this simple procedure will be a breath of fresh air, it’s clean and easy to comprehend, If you’d like to learn more the 3GPP specification docs on the topic are clear and comprehensible, I’d suggest:
TS 132 290 – Short overview of charging mechanisms
TS 132 291 – Specifics of the Nchf_ConvergedCharging interface
The common 3GPP charging architecture is specified in TS 32.240
TS 132 291 – Overview of components and SBIs inc Operations
Early on as subscriber trunk dialing and automated time-based charging was introduced to phone networks, engineers were faced with a problem from Payphones.
Previously a call had been a fixed price, once the caller put in their coins, if they put in enough coins, they could dial and stay on the line as long as they wanted.
But as the length of calls began to be metered, it means if I put $3 of coins into the payphone, and make a call to a destination that costs $1 per minute, then I should only be allowed to have a 3 minute long phone call, and the call should be cutoff before the 4th minute, as I would have used all my available credit.
Conversely if I put $3 into the Payphone and only call a $1 per minute destination for 2 minutes, I should get $1 refunded at the end of my call.
We see the exact same problem with prepaid subscribers on IMS Networks, and it’s solved in much the same way.
In LTE/EPC Networks, Diameter is used for all our credit control, with all online charging based on the Ro interface. So let’s take a look at how this works and what goes on.
Generic 3GPP Online Charging Architecture
3GPP defines a generic 3GPP Online charging architecture, that’s used by IMS for Credit Control of prepaid subscribers, but also for prepaid metering of data usage, other volume based flows, as well as event-based charging like SMS and MMS.
Network functions that handle chargeable services (like the data transferred through a P-GW or calls through a S-CSCF) contain a Charging Trigger Function (CTF) (While reading the specifications, you may be left thinking that the Charging Trigger Function is a separate entity, but more often than not, the CTF is built into the network element as an interface).
The CTF is a Diameter application that generates requests to the Online Charging Function (OCF) to be granted resources for the session / call / data flow, the subscriber wants to use, prior to granting them the service.
So network elements that need to charge for services in realtime contain a Charging Trigger Function (CTF) which in turn talks to an Online Charging Function (OCF) which typically is part of an Online Charging System (AKA OCS).
For example when a subscriber turns on their phone and a GTP session is setup on the P-GW/PCEF, but before data is allowed to flow through it, a Diameter “Credit Control Request” is generated by the Charging Trigger Function (CTF) in the P-GW/PCEF, which is sent to our Online Charging Server (OCS).
The “Credit Control Answer” back from the OCS indicates the subscriber has the balance needed to use data services, and specifies how much data up and down the subscriber has been granted to use.
The P-GW/PCEF grants service to the subscriber for the specified amount of units, and the subscriber can start using data.
This is a simplified example – Decentralized vs Centralized Rating and Unit Determination enter into this, session reservation, etc.
The interface between our Charging Trigger Functions (CTF) and the Online Charging Functions (OCF), is the Ro interface, which is a Diameter based interface, and is common not just for online charging for data usage, IMS Credit Control, MMS, value added services, etc.
3GPP define a reference online-charging interface, the Ro interface, and all the application-specific interfaces, like the Gy for billing data usage, build on top of the Ro interface spec.
Basic Credit Control Request / Credit Control Answer Process
This example will look at a VoLTE call over IMS.
When a subscriber sends an INVITE, the Charging Trigger Function baked in our S-CSCF sends a Diameter “Credit Control Request” (CCR) to our Online Charging Function, with the type INITIAL, meaning this is the first CCR for this session.
The CCR contains the Service Information AVP. It’s this little AVP that is where the majority of the magic happens, as it defines what the service the subscriber is requesting. The main difference between the multitude of online charging interfaces in EPC networks, is just what the service the customer is requesting, and the specifics of that service.
For this example it’s a voice call, so this Service Information AVP contains a “IMS-Information” AVP. This AVP defines all the parameters for a IMS phone call to be online charged, for a voice call, this is the called-party, calling party, SDP (for differentiating between voice / video, etc.).
It’s the contents of this Service Information AVP the OCS uses to make decision on if service should be granted or not, and how many service units to be granted. (If Centralized Rating and Unit Determination is used, we’ll cover that in another post) The actual logic, relating to this decision is typically based on the the rating and tariffing, credit control profiles, etc, and is outside the scope of the interface, but in short, the OCS will make a yes/no decision about if the subscriber should be granted access to the particular service, and if yes, then how many minutes / Bytes / Events should be granted.
In the received Credit Control Answer is received back from our OCS, and the Granted-Service-Unit AVP is analysed by the S-CSCF. For a voice call, the service units will be time. This tells the S-CSCF how long the call can go on before the S-CSCF will need to send another Credit Control Request, for the purposes of this example we’ll imagine the returned value is 600 seconds / 10 minutes.
The S-CSCF will then grant service, the subscriber can start their voice call, and start the countdown of the time granted by the OCS.
As our chatty subscriber stays on their call, the S-CSCF approaches the limit of the Granted Service units from the OCS (Say 500 seconds used of the 600 seconds granted). Before this limit is reached the S-CSCF’s CTF function sends another Credit Control Request with the type UPDATE_REQUEST. This allows the OCS to analyse the remaining balance of the subscriber and policies to tell the S-CSCF how long the call can continue to proceed for in the form of granted service units returned in the Credit Control Answer, which for our example can be 300 seconds.
Eventually, and before the second lot of granted units runs out, our subscriber ends the call, for a total talk time of 700 seconds.
But wait, the subscriber been granted 600 seconds for our INITIAL request, and a further 300 seconds in our UPDATE_REQUEST, for a total of 900 seconds, but the subscriber only used 700 seconds?
The S-CSCF sends a final Credit Control Request, this time with type TERMINATION_REQUEST and lets the OCS know via the Used-Service-Unit AVP, how many units the subscriber actually used (700 seconds), meaning the OCS will refund the balance for the gap of 200 seconds the subscriber didn’t use.
If this were the interface for online charging of data, we’d have the PS-Information AVP, or for online charging of SMS we’d have the SMS-Information, and so on.
The architecture and framework for how the charging works doesn’t change between a voice call, data traffic or messaging, just the particulars for the type of service we need to bill, as defined in the Service Information AVP, and the OCS making a decision on that based on if the subscriber should be granted service, and if yes, how many units of whatever type.
While we’ve covered the Update Location Request / Response, where an MME is able to request subscriber data from the HSS, what about updating a subscriber’s profile when they’re already attached? If we’re just relying on the Update Location Request / Response dialog, the update to the subscriber’s profile would only happen when they re-attach.
We need a mechanism where the HSS can send the Request and the MME can send the response.
This is what the Insert Subscriber Data Request/Response is used for.
Let's imagine we want to allow a subscriber to access an additional APN, or change an AMBR values of an existing APN;
We'd send an Insert Subscriber Data Request from the HSS, to the MME, with the Subscription Data AVP populated with the additional APN the subscriber can now access.
Beyond just updating the Subscription Data, the Insert Subscriber Data Request/Response has a few other funky uses.
Through it the HSS can request the EPS Location information of a Subscriber, down to the TAC / eNB ID serving that subscriber. It’s not the same thing as the GMLC interfaces used for locating subscribers, but will wake Idle UEs to get their current serving eNB, if the Current Location Request is set in the IDR Flags.
But the most common use for the Insert-Subscriber-Data request is to modify the Subscription Profile, contained in the Subscription-Data AVP,
If the All-APN-Configurations-Included-Indicator is set in the AVP info, then all the existing AVPs will be replaced, if it’s not then everything specified is just updated.
The Insert Subscriber Data Request/Response is a bit novel compared to other S6a requests, in this case it’s initiated by the HSS to the MME (Like the Cancel Location Request), and used to update an existing value.
In the last we covered what ENUM is and how it works, so to take this into a more practical example, I thought I’d share the details of the ENUM server I’ve setup in my lab, and the Docker container I’ve bundled it into.
Inside the Docker container we’ll be running Bind – this post won’t teach you much about Bind, there’s already lots of good information on it elsewhere, but we will cover the parameters involved in setting up ENUM records (NAPTR) for E.164 addresses.
Getting the Environment up and Running
First we’ll need to setup our environment, I’ve published the images for the container to Dockerhub, but we’ll build it from the Dockerfile so you can edit the files and rebuild as you play around:
systemd-resolve on Ubuntu binds to port 53 by default, which can lead to some headaches, so we’ll create a new network in Docker for this to run in, so it doesn’t conflict with anything else you may be running:
And now we’ll run the ENUM container in the enum_playground network and with the IP 172.30.0.2,
docker run -d --rm --name=enum --net=enum_playground --ip=172.30.0.2 enum
Ok, that’s the environment setup, let’s run some queries!
E.164 to SIP URI Resolution with ENUM
In our last post we covered the basics of formatting an E.164 number and querying a DNS server to get it’s call routing information.
Again we’re going to use Dig to query this information. In reality ENUM queries would be run by an endpoint, or software like FreeSWITCH or Kamailio (Spoiler alert, posts on ENUM handling in those coming later), but as we’re just playing Dig will work fine.
So let’s start by querying a single E.164 address, +61355500911
First we’ll reverse it and put full stops / periods between the numbers, to get 1.1.9.0.0.5.5.5.3.1.6
Next we’ll add the e164.arpa prefix, which is the global prefix for ENUM addresses, and presto, that’s what we’ll query – 1.1.9.0.0.5.5.5.3.1.6.e164.arpa
Lastly we’ll feed this into a Dig query against the IP of our container and of type NAPTR,
Next up is the TTL or expiry, in this case it’s 3600 seconds (1 hour), shorter periods allow for changes to propagate / be reflected more quickly but at the expense of more load as results can’t be cached for as long. The class (IN) represents Internet, which is the only class commonly used, even on internal systems.
Then we have the type of record returned, in our case it’s a NAPTR record,
1.1.9.0.0.5.5.5.3.1.6.e164.arpa.3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .
After that is the Order, this defines the order in which the rules are to be parsed. Lower numbers are processed first, if no matches then the next lowest, and so on until the highest number is reached, we’ll touch on this in more detail later in this post,
1.1.9.0.0.5.5.5.3.1.6.e164.arpa.3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .
The Pref is the processing preference. This is very handy for load balancing, as we can split traffic between hosts with different preferences. We’ll cover this later in this post too.
1.1.9.0.0.5.5.5.3.1.6.e164.arpa.3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .
The Flags represent the type of record we’re going to get, for most ENUM traffic this is going to be set to U, to denote a SIP URI with Regex, while the Service value we’ll be looking for will be “E2U+sip” service to identify SIP URIs to route calls to, but could be other values like Email addresses, IM Addresses or PSTN numbers, to be parsed by other applications.
1.1.9.0.0.5.5.5.3.1.6.e164.arpa.3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .
Lastly we’ve got the Regex part. Again not going to cover Regex as a whole, just the DNS particulars.
Everything between the first and second ! denotes what we’re searching for, while everything from the second ! to the last ! denotes what we replace it with.
In the below example that means we’re matching ^.* which means starting with (^) any character (.) zero or more times (*), which gets replaced with sip:[email protected],
1.1.9.0.0.5.5.5.3.1.6.e164.arpa.3600 IN NAPTR 10 100"u" "E2U+sip" "!^.*$!sip:[email protected]!" .
How should this be treated?
For the first example, a call to the E.164 address of 61355500912 will be first formatted into a domain as per the ENUM requirements (1.1.9.0.0.5.5.5.3.1.6.e164.arpa) and then queried as a NAPTR record against the DNS server,
1.1.9.0.0.5.5.5.3.1.6.e164.arpa.3600 IN NAPTR 10 100"u" "E2U+sip" "!^.*$!sip:[email protected]!" .
Only a single record has been returned so we don’t need to worry about the Order or Preference, and the Regex matches anything and replaces it with the resulting SIP URI of sip:[email protected], which is where we’ll send our INVITE.
Under the Hood
Inside the Repo we cloned earlier, if you open the e164.arpa.db file, things will look somewhat familiar,
The record we just queried is the first example in the Bind config file,
; E.164 Address +61355500911 - Simple no replacement (Resolves all traffic to sip:[email protected])
1.1.9.0.0.5.5.5.3.1.6 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .
The config file is just the domain, class, type, order, preference, flags, service and regex.
Astute readers may have noticed the trailing . which where we can put a replacement domain if Regex is not used, but it cannot be used in conjunction with Regex, so for all our work it’ll just be a single trailing . on each line.
You can (and probably should) change the values in the e164.arpa.db file as we go along to try everything out, you’ll just need to rebuild the container and restart it each time you make a change.
This post is going to focus on Bind, but the majority of modern DNS servers support NAPTR records, so you can use them for ENUM as well, for example I manage the DNS for this site thorough Cloudflare, and I’ve put a screenshot below of an example private ENUM address I’ve added into it.
Setting up a NAPTR record in Cloudflare DNS
Preference to Split Traffic between Servers
So with a firm understanding of a single record being returned, let’s look at how we can use ENUM to cleverly route traffic to multiple hosts.
If we have a pool of servers we may wish to evenly distribute all traffic across them, so that’s how E.164 address +61355500912 is setup – to route traffic evenly (50/50) across two servers.
Querying it with Dig provides the following result:
So as the order value (10) is the same for both records, we can ignore it – there isn’t one value lower than the other.
We can see both records have a preference of 100, in practice, this means they each get 50% of the traffic. The formula for traffic distribution is pretty simple, each server gets the value of it’s preference, divided by the total of all the preferences,
So for server1 it’s preference is 100 and the total of all the preferences combined is 200, so it gets 100/200, which is equivalent to one half aka 50%.
We might have a scenario where we have 3 servers, but one is significantly more powerful than the others, so let’s look at giving more traffic to one server and less to others, this example gets a little more complex but should cement your understanding of how the preference works;
So now 3 servers, again none have a lower order than the other, it’s set to 10 for them all so we can ignore the order,
Next we can see the total of all the priority values is 400,
Server 2 has a priority of 100 so it gets 100/400 total priority, or a quarter of all traffic. Server 1 has the same value, so also gets a quarter of all traffic,
Server 3 however has a priority of 200 so it gets 200/400, or to simplify half of all traffic.
The Bind config for this is:
; E.164 Address +61355500913 - More complex load balance between 3 hosts (25% server1, 25% server2, 50% server3)
3.1.9.0.0.5.5.5.3.1.6 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" . 3.1.9.0.0.5.5.5.3.1.6 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .
3.1.9.0.0.5.5.5.3.1.6 IN NAPTR 10 200 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .
Order for Failover
Primarily the purpose of the order is to enable wildcard routes (as we’ll see later) to be overwritten by more specific routes, but a secondary use in some implementations use Order as a way to list the preferences of the SIP URIs to route to. For example we could have two servers, one a primary and the other a standby, with the standby only to be used only if the primary SIP URI was not responding.
E.164 number +61355500914 is setup to return two SIP URIs,
Our DNS client will first use the SIP URI sip:[email protected] as it has the lower order value (10), and if that fails, can try the entry with the next lowest order-value (20) which would be sip:[email protected].
The Bind config for this is:
; E.164 Address +61355500914 - Order example returning multiple SIP URIs to try for failover
4.1.9.0.0.5.5.5.3.1.6 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" . 4.1.9.0.0.5.5.5.3.1.6 IN NAPTR 20 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .
Wildcards
If we have a 1,000 number block, having to add 1000 individual records can be very tedious. Instead we can use wildcard matching (thanks to the fact we’ve reversed the E.164 address) to match ranges. For example if we have E.164 numbers from +61255501000 to +61255501999 we can add a wildcard entry to match the +61255501x prefix,
I’ve set this up already so let’s lookup the E.164 number +6125501234,
If you look up any other number starting with +6125501 you’ll get the same result, and here’s the Bind config for it:
; Wildcard E.164 Address +61255501* - Wildcard example for all destinations starting with E.164 prefix +61255501x to single destination (sip:[email protected])
; For example E.164 number +6125501234 will resolve to sip:[email protected]
*.1.0.5.5.5.2.1.6 IN NAPTR 100 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .
The catch with this is they’re all pointing at the same SIP URI, so we can’t treat the calls differently based on the called number – This is where the Regex magic comes in.
We can use group matching to match a group and fill it in the dialed number into the SIP Request URI, for example:
Will match the E.164 number requested and put it inside sip:[email protected]
The +61255502xxx prefix is setup for this, so if we query +61255502000 (or any other number between +61255502000 and +61255502999) we’ll get the regex query in the resulting record.
Keep in mind DNS doesn’t actually apply the Regex transformation, just shares it, and the client applies the transformation.
; Wildcard example for all destinations starting with E.164 prefix +61255502x to regex filled destination
; For example a request to 61255502000 will return sip:[email protected])
*.2.0.5.5.5.2.1.6 IN NAPTR 100 100 "u" "E2U+sip" "!(^.*$)!sip:+1\\[email protected]!" .
One last thing to keep in mind, is that Wildcard priorities are of any length. This means +612555021 would match as well as +6125550299999999999999. Typically terminating switches drop any superfluous digits, and NU those that are too short, but keep this in mind, that length is not taken into account.
Wildcard Priorities
So with our wildcards in place, what if we wanted to add an exception, for example one number in our 61255502xxx block of numbers gets ported to another carrier and needs to be routed elsewhere?
Easy, we just add another entry for that number being more specific and with a lower order than the wildcard, which is what’s setup for E.164 number +61255502345,
Which does not return the same result as the others that match the wildcard,
Bind config:
; Wildcard example for all destinations starting with E.164 prefix +61255502x to regex filled destination
; For example a request to +61255502000 will return sip:[email protected])
*.2.0.5.5.5.2.1.6 IN NAPTR 100 100 "u" "E2U+sip" "!(^.*$)!sip:+1\\[email protected]!" .
; More specific example with lower order than +6125550x wildcard for E.164 address +61255502345 will return sip:[email protected]
5.4.3.2.0.5.5.5.2.1.6 IN NAPTR 50 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .
We can combine all of the tricks we’ve covered here, from statically defined entries, wildcards, regex replacement, multiple entries with multiple orders and preferences, to create really complex routing, using only DNS.
Summary & Next Steps
So by now hopefully you’ve got a fair understanding of how NAPTR and DNS work together to translate E.164 addresses into SIP URIs,
Of course being able to do this manually with Dig and comprehend how it’ll route is only one part of the picture, in the next posts we’ll cover using Kamailio and FreeSWITCH to query ENUM routing information and route traffic to it,
DNS is commonly used for resolving domain names to IP Addresses, and is often described as being like “the phone book of the Internet”.
So what’s the phone book of phone books?
The answer, is (kind of) DNS. With the aid of E.164 number to URI mapping (ENUM), DNS can be used to resolve phone numbers into SIP URIs to route the traffic to.
So what is ENUM?
ENUM allows us to bypass the need for a central switch for routing calls to numbers, and instead, through a DNS lookup, resolve a phone number into a reachable SIP URI that is the ultimate destination for the traffic.
Imagine you want to call a company, you dial the phone number for that company, your phone does a DNS query against the phone number, which returns the SIP URI of the company’s PBX, and your phone sends the SIP INVITE directly to the company’s PBX, with no intermediary party carrying the call.
3GPP have specified ENUM as the prefered mechanism for resolving phone numbers into SIP addresses, and while it’s widespread adoption on the public Internet is still in its early days (See my post on The Sad story of ENUM in Australia) it is increasingly common in IMS networks and inside operator networks.
ENUM allow us to lookup a phone number on a DNS server and find the SIP URI a server that will handle traffic for the phone number, but it’s a bit more complicated than the A or AAAA records you’d use to resolve a website, ENUM relies on NAPTR records.
Let’s look at the steps involved in taking an E.164 number and knowing where to send it.
Step 1 – Reverse the Numbers
We read phone numbers from left to right.
This is because historically the switch needs to get all the long-distance routing sorted first. The switch has to route your call to the exchange that serves that subscriber, which is what all the area codes and prefixes assigned to areas are all about (Throwback to SZU for any old Telco buffs).
For an E.164 number you’ve got a Country Code, Area Code and then the Subscriber Number. The number gets more specific as it goes along.
But getting more specific as you go along is the opposite how how DNS works, millions of domains share the .com suffix, and the unique / specific part is the bits before that.
So the first step in the ENUM process is to reverse the phone number, so let’s take phone number (03) 5550 0912, which in E.164 is +61 3 5550 0912.
As the spaces in the phone numbers are there for the humans, we’ll drop all of them and reverse the number, as DNS is more specific right-to-left, so we end up with
2.1.9.0.0.5.5.5.3.1.6
Step 2 – Add the Suffix
The ITU ENUM specifies the suffix e164.arpa be assigned for public ENUM entries. Private ENUM deployments may use their own suffix, but to make life simple I’m going to use e164.arpa as if it were public.
So we’ll append the e164.arpa domain onto our reversed and formatted E.164 phone number:
2.1.9.0.0.5.5.5.3.1.6.e164.arpa
Step 3 – Query it
Next we’ll run a Naming Authority Pointer (NAPTR) query against the domain, to get back a list of records for that number.
DNS is a big topic, and NAPTR and SRV takes up a good chunk of it, but what you need to know is that by using NAPTR we’re not limited to just a single response, we could have a weighted pool of servers handling traffic for this phone number, and be able to control load through the use of NAPTR, amongst other things.
DNS NAPTR QueryDNS NAPTR Response
Of course, if our phone can query the public NAPTR records, then so can anyone else, so we can just use a tool like Dig to query the record ourselves,
In the answers section I’ve setup this DNS server to only return a single response, with the regex SIP URI to use, in my case that’s sip:[email protected]
You’ll obviously need to replace the DNS server with your DNS server, and the query with the reversed and formatted version of the E.164 number you wish to query.
Step 4 – Send SIP traffic
After looking at the NAPTR records returned and using the weight and priority to determine which server/s to send to first, our phone forwards an INVITE to the URI returned in the NAPTR record.
How to interpret the returned results?
The first thing to keep in mind when working with ENUM is multiple records being returned is supported, and even encouraged.
NAPTR results return 7 fields, which define how it should be handled.
The host part is fairly obvious, and defines the host / DNS entry we’re talking about.
The Service defines what type of service this is. ENUM can be expanded beyond just voice, for example you may want to also return an email address or IM address as well as a SIP Address on an ENUM query, which you can do. By default voice uses the “E2U+sip” service to identify SIP URIs to route calls to, so in this context that’s what we’re interested in, but keep in mind there are other types out there,
Example ENUM query against a phone number showing other types of services (Email & Web)
The Order simply defines the order in which the rules are to be parsed. Lower numbers are processed first, if no matches then the next lowest, and so on until the highest number is reached.
The Pref is the processing preference. For load balancing 50/50 between two sites say a Melbourne and Sydney site, we’d return two results, with the same Order, and the same Pref, would see traffic split 50/50 between the two sites. We could split this further, a Pref value of 10 for Melbourne, 10 for Sydney, 5 for Brisbane and 5 for Perth would see 33% of calls route to Melbourne, 33% of calls route to Sydney, 16.5% of calls route to Brisbane and 16.5% of calls route to Perth. This is because we’d have a total preference value of 30, and the individual preference for each entry would work out as the fraction of the total (ie Pref 10 out of 30 = 10/30 or 33.3%).
The Flags denote the type of record we’re going to get, for most ENUM traffic this is going to be set to U, to denote a SIP URI with Regex.
The regexp field contains our SIP URI in the form of a Regular expression, which can include pattern matching and replacement. This is most commonly used to fill in the phone number into the SIP URI, for example instead of hardcoding the phone number into the response, we could use a Regular expression to fill in the requested number into the SIP URI.
If you’re looking to implement ENUM for an internal network, great, I’ll have some more posts here over the next few weeks covering off configuration of a DNS server to support ENUM lookups, and using Kamailio to lookup ENUM routes.
In terms of public ENUM, while many carriers are using ENUM inside their networks, public adoption of ENUM in most markets has been slow, for a number of reasons.
Many incumbent operators have been reluctant to embrace public ENUM as their role as an operator would be relegated to that of a Domain registrar. Additionally, there’s real security risks involved in moving to ENUM – opening your phone system up to the world to accept inbound calls from anywhere. This could lead to DOS-style attacks of flooding phone numbers with automatically generated traffic, privacy risks and even less validation in terms of caller ID trust.
RIPE maintains the EnumData.org website listing the status of ENUM for each country / region.
We’ve covered SMS in the past, but MMS is a different kettle of fish.
Let’s look at how the call flow goes, when Bob wants to send a picture to Alice.
Before Bob sends the MMS, his phone will have to be setup with the correct settings to send MMS. Sometimes this is done manually, for others it’s done through the Carrier provisioning SMS that preloads the settings, and for others it’s baked in based on the Android Carrier settings XML,
APN settings for Telstra in Australia for MMS
It’s made up of the APN to send MMS traffic over, the MMSC address (Multimedia Message Switching Center) and often an MMS proxy and port combination for where the traffic will actually go.
Message Flow – Bob to MMSC (Mobile Originated MMS)
Bob opens his phone, creates a new message to Alice, selects the picture (or other multimedia filetype) to send to her and hits the send button.
For starters, MMS has a file size limit, like MTU it’s not advertised, so you don’t know if you’ve hit it, so rather like MTU is a “lowest has the highest success of getting through” rule. So Bob’s phone will most likely scale the image down to fit inside 300K.
Next Bob’s phone knows it has an MMS to send, for this is opens up a new bearer on the MMS APN, typically called MMS, but configured in the phone by Bob.
Why use a separate APN for sending 300K of MMS traffic? Once upon a time mobile data was expensive. By having a separate APN just for MMS traffic (An APN that could do nothing except send / receive MMS) allowed easier billing / tariffing of data, as MMS traffic was sent over a APN which was unmetered.
After the bearer is setup on the MMS APN, Bob’s phone begins crafting a HTTP 1.1 Post to be sent to the MMSC. The content type of this request will be application/vnd.wap.mms-message and the body of the HTTP post will be made up of MMS Message Encapsulation, with the body containing the picture he wants to send to Alice.
Note: Historically Wireless Session Protocol (WSP) was used in lieu of HTTP. These clients would now need a WAP gateway to translate into HTTP.
This HTTP Post is then sent to the MMSC Address, or, if present, the MMSC Proxy address. This traffic is sent over the MMS APN that we just brought up.
HTTP POST Headers for the MO MMS MessageMMS Message Encapsulation from MO MMS Message
The MMSC receives this information, and then, if all was successful, responds with a 200 OK,
200 OK response to MO MMS Message
So now the MMSC has the information from Bob, let’s flip over to Alice.
Message Flow – MMSC to Alice (Mobile Terminated MMS)
For the purposes of simplicity, we’re going to rule out the MMSC from doing clever things like converting the media, accepting email (SMPP) as MMS, etc, etc. Instead we’re going to assume Alice and Bob are on the same Network, and our MMSC is just doing store-and-forward.
The MMSC will look at the To address in the MMS Message Encapsulation of the request Bob sent, to determine that this message is destined for Alice.
The MMSC will load the media content (photo) sent by Bob destined for Alice and serve it via HTTP. The MMSC generates a random URL to serve it this particular file on, with each MMS the MMSC handles being assigned a random URL containing the media content.
Next the MMSC will need to tell Alice’s phone, that she has an MMS waiting for her. This is done by generating an SMS to send to Alice’s phone,
The user-data of this SMS is the Wireless Session Protocol with the method PUSH – Aka WAP Push.
SMS alerting the user of an MMS waiting for delivery
This specially encoded SMS is parsed by the Alice’s phone, which tells the her there is an MMS message waiting for her.
On some operating systems this is pulled automatically, on others, users need to select “Download” to actually get the file.
The UE then just runs an HTTP get to the address in the X-Mms-Content-Location: Header to pull the multimedia content that Bob sent.
HTTP GET from Alice’s Phone / UE to retrieve MMS sent by Bob (MT-MMS)
All going well the URL is valid and Alice’s phone retrieves the message, getting a 200 OK back from the server with the message content.
HTTP Response (200 OK) for MT-MMS, sent by the MMSC to Alice’s phone with the MMS Body
So now Alice’s phone has the MMS content and renders it on the screen, Alice can see the Photo Bob sent her.
Lastly Alice’s phone sends a HTTP POST again to the MMSC, this time indicating the message status is “Retrieved”,
And to close everything off the MMSC confirms receipt of the Retrieved status with a 200 OK, and we are done.
What didn’t we cover?
So that’s a basic MMS message flow, but there’s a few parts we didn’t cover.
The overall architecture beyond just the store-and forward behaviour, charging and authentication we didn’t cover. So let’s look at each of these points.
Overall Architecture
What we just covered what what’s defined as the MM1 interface.
There’s obviously a stack of other interfaces, such as for charging, messaging between MMSC/Carriers, subscriber locating / user database, etc.
Charging
MMSCs would typically have a connection to trigger charging events / credit-control events prior to processing the message.
For online charging the Ro interface can be used, as you would for IMS charging events.
3GPP 3GPP TS 32.270 covers the charging architecture for online/offline charging for MMS.
Authentication
Unfortunately authentication was a bit of an afterthought for the MMS standard, and can be done several different ways.
The most common is to correlate the IP Address on the MMS APN against a subscriber.
Chances are if you’re reading this, you’re trying to work out what Telephony Binary-Coded Decimal encoding is. I got you.
Again I found myself staring at encoding trying to guess how it worked, reading references that looped into other references, in this case I was encoding MSISDN AVPs in Diameter.
How to Encode a number using Telephony Binary-Coded Decimal encoding?
First, Group all the numbers into pairs, and reverse each pair.
So a phone number of 123456, becomes:
214365
Because 1 & 2 are swapped to become 21, 3 & 4 are swapped to become 34, 5 & 6 become 65, that’s how we get that result.
TBCD Encoding of numbers with an Odd Length?
If we’ve got an odd-number of digits, we add an F on the end and still flip the digits,
For example 789, we add the F to the end to pad it to an even length, and then flip each pair of digits, so it becomes:
87F9
That’s the abbreviated version of it. If you’re only encoding numbers that’s all you’ll need to know.
Detail Overload
Because the numbers 0-9 can be encoded using only 4 bits, the need for a whole 8 bit byte to store this information is considered excessive.
For example 1 represented as a binary 8-bit byte would be 00000001, while 9 would be 00001001, so even with our largest number, the first 4 bits would always going to be 0000 – we’d only use half the available space.
So TBCD encoding stores two numbers in each Byte (1 number in the first 4 bits, one number in the second 4 bits).
To go back to our previous example, 1 represented as a binary 4-bit word would be 0001, while 9 would be 1001. These are then swapped and concatenated, so the number 19 becomes 1001 0001 which is hex 0x91.
Let’s do another example, 82, so 8 represented as a 4-bit word is 1000 and 2 as a 4-bit word is 0010. We then swap the order and concatenate to get 00101000 which is hex 0x28 from our inputted 82.
Final example will be a 3 digit number, 123. As we saw earlier we’ll add an F to the end for padding, and then encode as we would any other number,
F is encoded as 1111.
1 becomes 0001, 2 becomes 0010, 3 becomes 0011 and F becomes 1111. Reverse each pair and concatenate 00100001 11110011 or hex 0x21 0xF3.
Special Symbols (#, * and friends)
Because TBCD Encoding was designed for use in Telephony networks, the # and * symbols are also present, as they are on a telephone keypad.
Astute readers may have noticed that so far we’ve covered 0-9 and F, which still doesn’t use all the available space in the 4 bit area.
The extended DTMF keys of A, B & C are also valid in TBCD (The D key was sacrificed to get the F in).
Symbol
4 Bit Word
*
1 0 1 0
#
1 0 1 1
a
1 1 0 0
b
1 1 0 1
c
1 1 1 0
So let’s run through some more examples,
*21 is an odd length, so we’ll slap an F on the end (*21F), and then encoded each pair of values into bytes, so * becomes 1010, 2 becomes 0010. Swap them and concatenate for our first byte of 00101010 (Hex 0x2A). F our second byte 1F, 1 becomes 0001 and F becomes 1111. Swap and concatenate to get 11110001 (Hex 0xF1). So *21 becomes 0x2A 0xF1.
And as promised, some Python code from PyHSS that does it for you:
def TBCD_special_chars(self, input):
if input == "*":
return "1010"
elif input == "#":
return "1011"
elif input == "a":
return "1100"
elif input == "b":
return "1101"
elif input == "c":
return "1100"
else:
print("input " + str(input) + " is not a special char, converting to bin ")
return ("{:04b}".format(int(input)))
def TBCD_encode(self, input):
print("TBCD_encode input value is " + str(input))
offset = 0
output = ''
matches = ['*', '#', 'a', 'b', 'c']
while offset < len(input):
if len(input[offset:offset+2]) == 2:
bit = input[offset:offset+2] #Get two digits at a time
bit = bit[::-1] #Reverse them
#Check if *, #, a, b or c
if any(x in bit for x in matches):
new_bit = ''
new_bit = new_bit + str(TBCD_special_chars(bit[0]))
new_bit = new_bit + str(TBCD_special_chars(bit[1]))
bit = str(int(new_bit, 2))
output = output + bit
offset = offset + 2
else:
bit = "f" + str(input[offset:offset+2])
output = output + bit
print("TBCD_encode output value is " + str(output))
return output
def TBCD_decode(self, input):
print("TBCD_decode Input value is " + str(input))
offset = 0
output = ''
while offset < len(input):
if "f" not in input[offset:offset+2]:
bit = input[offset:offset+2] #Get two digits at a time
bit = bit[::-1] #Reverse them
output = output + bit
offset = offset + 2
else: #If f in bit strip it
bit = input[offset:offset+2]
output = output + bit[1]
print("TBCD_decode output value is " + str(output))
return output
So it’s the not to distant future and the pundits vision of private LTE and 5G Networks was proved correct, and private networks are plentiful.
But what PLMN do they use?
The PLMN (Public Land Mobile Network) ID is made up of a Mobile Country Code + Mobile Network Code. MCCs are 3 digits and MNCs are 2-3 digits. It’s how your phone knows to connect to a tower belonging to your carrier, and not one of their competitors.
For example in Australia (Mobile Country Code 505) the three operators each have their own MCC. Telstra as the first licenced Mobile Network were assigned 505/01, Optus got 505/02 and VHA / TPG got 505/03.
Each carrier was assigned a PLMN when they started operating their network. But the problem is, there’s not much space in this range.
The PLMN can be thought of as the SSID in WiFi terms, but with a restriction as to the size of the pool available for PLMNs, we’re facing an IPv4 exhaustion problem from the start if we’re facing an explosion of growth in the space.
Let’s look at some ways this could be approached.
Everyone gets a PLMN
If every private network were to be assigned a PLMN, we’d very quickly run out of space in the range. Best case you’ve got 3 digits, so only space for 1,000 networks.
In certain countries this might work, but in other areas these PLMNs may get gobbled up fast, and when they do, there’s no more. New operators will be locked out of the market.
If you’re buying a private network from an existing carrier, they may permit you to use their PLMN,
Or if you’re buying kit from an existing vendor you may be able to use their PLMN too.
But what happens then if you want to move to a different kit vendor or another service provider? Do you have to rebuild your towers, reconfigure your SIMs?
Are you contractually allowed to continue using the PLMN of a third party like a hardware vendor, even if you’re no longer purchasing hardware from them? What happens if they change their mind and no longer want others to use their PLMN?
Everyone uses 999 / 99
The ITU have tried to preempt this problem by reallocating 999/99 for use in Private Networks.
The problem here is if you’ve got multiple private networks in close proximity, especially if you’re using CBRS or in close proximity to other networks, you may find your devices attempting to attach to another network with the same PLMN but that isn’t part of your network,
Mobile Country or Geographical Area Codes Note from TSB Following the agreement on the Appendix to Recommendation ITU-T E.212 on “shared E.212 MCC 999 for internal use within a private network” at the closing plenary of ITU-T SG2 meeting of 4 to 13 July 2018, upon the advice of ITU-T Study Group 2, the Director of TSB has assigned the Mobile Country Code (MCC) “999” for internal use within a private network.
Mobile Network Codes (MNCs) under this MCC are not subject to assignment and therefore may not be globally unique. No interaction with ITU is required for using a MNC value under this MCC for internal use within a private network. Any MNC value under this MCC used in a network has significance only within that network.
The MNCs under this MCC are not routable between networks. The MNCs under this MCC shall not be used for roaming. For purposes of testing and examples using this MCC, it is encouraged to use MNC value 99 or 999. MNCs under this MCC cannot be used outside of the network for which they apply. MNCs under this MCC may be 2- or 3-digit.
My bet is we’ll see the ITU allocate an MCC – or a range of MCCs – for private networks, allowing for a pool of PLMNs to use.
When deploying networks, Private network operators can try and pick something that’s not in use at the area from a pool of a few thousand options.
The major problem here is that there still won’t be an easy way to identify the operator of a particular network; the SPN is local only to the SIM and the Network Name is only present in the NAS messaging on an attach, and only after authentication.
If you’ve got a problem network, there’s no easy way to identify who’s operating it.
But as eSIMs become more prevalent and BIP / RFM on SIMs will hopefully allow operators to shift PLMNs without too much headache.
But if you really want to get the most bang for your buck, you’ll need to tune your SCTP parameters to match the network conditions.
While tuning the parameters per-association would be time consuming, most SCTP stacks allow you to set templates for SCTP parameters, for example you would have a different set of parameters for the SCTP stacks inside your network, compared to SCTP stacks for say a roaming scenario or across microwave links.
IETF kindly provides a table with their recommended starting values for SCTP parameter tuning:
But by adjusting the Max Retrans and Retransmission Timeout (RTO) values, we can detect failures on the network more quickly, and reduce the number of packets we’ll loose should we have a failure.
We begin with the engineered round-trip time (RTT) – that is made up of the time it takes to traverse the link, processing time for the remote SCTP stack and time for the response to traverse the link again. For the examples below we’ll take an imaginary engineered RTT of 200ms.
RTO.min is the minimum retransmission timeout. If this value is set too low then before the other side has had time to receive the request, process it and send a response, we’ve already retransmitted it.
This should be set to the round trip delay plus processing needed to send and acknowledge a packet plus some allowance for variability due to jitter; a value of 1.15 times the Engineered RTT is often chosen
So for us, 200 * 1.15 = 230ms RTO.min value.
RTO.max is the maximum amount of time we should wait before transmitting a request. Typically three times the Engineered RTT.
So for us, 200 * 3 = 600ms RTO.min value.
Path.Max.Retransmissions is the maximum number of retransmissions to be sent down a path before the path is considered to be failed. For example if we loose a transmission path on a multi-homed server, how many retransmissions along that path should we send until we consider it to be down?
Values set are dependant on if you’re multi-homing or not (you can be more picky if you are) and the level of acceptable packet loss in your transmission link.
Typical values are 4 Retransmissions (per destination address) for a Single-Homed association, and 2 Retransmissions (per destination address) for a Multi-Homed association.
Association.Max.Retransmissions is the maximum number of retransmissions for an association. If a transmission link in a multi-homed SCTP scenario were to go down, we would pass the Path.Max.Retransmissions value and the SCTP stack would stop sending traffic out that path, and try another, but what if the remote side is down? In that scenario all our paths would fail, so we need another counter – Path.Max.Retransmissions to count the total number of retransmissions to an association / destination. When the Association.Max.Retransmissions is reached the association is considered down.
In practice this value would be the number of paths, multiplied by the Path.Max.Retransmissions.
Network Slicing, is a new 5G Technology. Or is it?
Pre 3GPP Release 16 the capability to “Slice” a network already existed, in fact the functionality was introduced way back at the advent of GPRS, so what is so new about 5G’s Network Slicing?
Network Slice: A logical network that provides specific network capabilities and network characteristics
3GPP TS 123 501 / 3 Definitions and Abbreviations
Let’s look at the old and the new ways, of slicing up networks, pre release 16, on LTE, UMTS and GSM.
Old Ways: APN Separation
The APN or “Access Point Name” is used so the SGSN / MME knows which gateway to that subscriber’s traffic should be terminated on when setting up the session.
APN separation is used heavily by MVNOs where the MVNO operates their own P-GW / GGSN. This allows the MNVO can handle their own rating / billing / subscriber management when it comes to data. A network operator just needs to setup their SGSN / MME to point all requests to setup a bearer on the MVNO’s APN to the MNVO’s gateways, and presoto, it’s no longer their problem.
Later as customers wanted MPLS solutions extended over mobile (Typically LTE), MNOs were able to offer “private APNs”. An enterprise could be allocated an APN by the MNO that would ensure traffic on that APN would be routed into the enterprise’s MPLS VRF. The MNO handles the P-GW / GGSN side of things, adding the APN configuration onto it and ensuring the traffic on that APN is routed into the enterprise’s VRF.
Different QCI values can be assigned to each APN, to allow some to have higher priority than others, but by slicing at an APN level you lock all traffic to those QoS characteristics (Typically mobile devices only support one primary APN used for routing all traffic), and don’t have the flexibility to steer which networks which traffic from a subscriber goes to.
It’s not really practical for everyone to have their own APNs, due in part to the namespace limitations, the architecture of how this is usually done limits this, and the simple fact of everyone having to populate an APN unique to them would be a real headache.
5G replaces APNs with “DNNs” – Data Network Names, but the functionality is otherwise the same.
In Summary: APN separation slices all traffic from a subscriber using a special APN and provide a bearer with QoS/QCI values set for that APN, but does not allow granular slicing of individual traffic flows, it’s an all-or-nothing approach and all traffic in the APN is treated equally.
The old Ways: Dedicated Bearers
Dedicated bearers allow traffic matching a set rule to be provided a lower QCI value than the default bearer. This allows certain traffic to/from a UE to use GBR or Non-GBR bearers for traffic matching the rule.
The rule itself is known as a “TFT” (Traffic Flow Template) and is made up of a 5 value Tuple consisting of IP Source, IP Destination, Source Port, Destination Port & Protocol Number. Both the UE and core network need to be aware of these TFTs, so the traffic matching the TFT can get the QCI allocated to it.
This can be done a variety of different ways, in LTE this ranges from rules defined in a PCRF or an external interface like those of an IMS network using the Rx interface to request a dedicated bearers matching the specified TFTs via the PCRF.
Unlike with 5G network slicing, dedicated bearers still traverse the same network elements, the same MME, S-GW & P-GW is used for this traffic. This means you can’t “locally break out” certain traffic.
In Summary: Dedicated bearers allow you to treat certain traffic to/from subscribers with different precedence & priority, but the traffic still takes the same path to it’s ultimate destination.
This means one eNodeB can broadcast more than one PLMN and server more than one mobile network.
This slicing is very coarse – it allows two operators to share the same eNodeBs, but going beyond a handful of PLMNs on one eNB isn’t practical, and the PLMN space is quite limited (1000 PLMNs per country code max).
In Summary: MOCN allows slicing of the RAN on a very coarse level, to slice traffic from different operators/PLMNs sharing the same RAN.
Its use is focused on sharing RAN rather than slicing traffic for users.
Want more telecom goodness?
I have a good old fashioned RSS feed you can subscribe to.