Category Archives: RFCs & Standards

Kamailio Diameter Routing Agent Support

Recently I’ve been working on open source Diameter Routing Agent implementations (See my posts on FreeDiameter).

With the hurdles to getting a DRA working with open source software covered, the next step was to get all my Diameter traffic routed via the DRAs, however I soon rediscovered a Kamailio limitation regarding support for Diameter Routing Agents.

You see, when Kamailio’s C Diameter Peer module makes a decision as to where to route a request, it looks for the active Diameter peers, and finds a peer with the suitable Vendor and Application IDs in the supported Applications for the Application needed.

Unfortunately, a DRA typically only advertises support for one application – Relay.

This means if you have everything connected via a DRA, Kamailio’s CDP module doesn’t see the Application / Vendor ID for the Diameter application on the DRA, and doesn’t route the traffic to the DRA.

The fix for this was twofold, the first step was to add some logic into Kamailio to determine if the Relay application was advertised in the Capabilities Exchange Request / Answer of the Diameter Peer.

I added the logic to do this and exposed this so you can see if the peer supports Diameter relay when you run “cdp.list_peers”.

With that out of the way, next step was to update the routing logic to not just reject the candidate peer if the Application / Vendor ID for the required application was missing, but to evaluate if the peer supports Diameter Relay, and if it does, keep it in the game.

I added this functionality, and now I’m able to use CDP Peers in Kamailio to allow my P-CSCF, S-CSCF and I-CSCF to route their traffic via a Diameter Routing Agent.

I’ve got a branch with the changes here and will submit a PR to get it hopefully merged into mainline soon.

NB-IoT NIDD Basics

NB-IoT introduces support for NIDD – Non-IP Data Delivery (NIDD) which is one of the cool features of NB-IoT that’s gaining more widespread adoption.

Let’s take a deep dive into NIDD.

The case against IP for IoT

In the over 40 years since IP was standardized, we’ve shoehorned many things onto IP, but IP was never designed or optimized for low power, low throughput applications.

For the battery life of an IoT device to be measured in years, it has to be very selective about what power hungry operations it does. Transmitting data over the air is one of the most power-intensive operations an IoT device can perform, so we need to do everything we can to limit how much data is sent, and how frequently.

Use Case – NB-IoT Tap

Let’s imagine we’re launching an IoT tap that transmits information about water used, as part of our revolutionary new “Water as a Service” model (WaaS) which removes the capex for residents building their own water treatment plant in their homes, and instead allows dynamic scaling of waterloads as they move to our new opex model.

If I turn on the tap and use 12L of water, when I turn off the tap, our IoT tap encodes the usage onto a single byte and sends the usage information to our rain-cloud service provider.

So we’re not constantly changing the batteries in our taps, we need to send this one byte of data as efficiently as possible, so as to maximize the battery life.

If we were to transport our data on TCP, well we’d need a 3 way handshake and several messages just to transmit the data we want to send.

Let’s see how our one byte of data would look if we transported it on TCP.

That sliver of blue in the diagram is our usage component, the rest is overhead used to get it there. Seems wasteful huh?

Sure, TCP isn’t great for this you say, you should use UDP! But even if we moved away from TCP to UDP, we’ve still got the IPv4 header and the UDP header wasting 28 bytes.

For efficiency’s sake (To keep our batteries lasting as long as possible) we want to send as few messages as possible, and where we do have to send messages, keep them very short, so IP is not a great fit here.

Enter NIDD – Non-IP Data Delivery.

Through NIDD we can just send the single hex byte, only be charged for the single hex byte, and only stay transmitting long enough to send this single byte of hex (Plus the NBIoT overheads / headers).

Compared to UDP transport, NIDD provides us a reduction of 28 bytes of overhead for each message, or a 96% reduction in message size, which translates to real power savings for our IoT device.

In summary – the more sending your device has to do, the more battery it consumes.
So in a scenario where you’re trying to maximize power efficiency to keep your batter powered device running as long as possible, needing to transmit 28 bytes of wasted data to transport 1 byte of usable data, is a real waste.

Delivering the Payload

NIDD traffic is transported as raw hex data end to end, this means for our 1 byte of water usage data, the device would just send the hex value to be transferred and it’d pop out the other end.

To support this we introduce a new network element called the SCEFService Capability Exposure Function.

From a developer’s perspective, the SCEF is the gateway to our IoT devices. Through the RESTful API on the SCEF (T8 API), we can send and receive raw hex data to any of our IoT devices.

When one of our Water-as-a-Service Taps sends usage data as a hex byte, it’s the software talking on the T8 API to the SCEF that receives this data.

Data of course needs to be addressed, so we know where it’s coming from / going to, and T8 handles this, as well as message reliability, etc, etc.

This is a telco blog, so we should probably cover the MME connection, the MME talks via Diameter to the SCEF. In our next post we’ll go into these signaling flows in more detail.

If you’re wondering what the status of Open Source SCEF implementations are, then you may have already guessed I’m working on one!

Hopefully by now you’ve got a bit of an idea of how NIDD works in NB-IoT, and in our next posts we’ll dig deeper into the flows and look at some PCAPs together.

Diameter Routing Agents – Part 5 – AVP Transformations

Having a central pair of Diameter routing agents allows us to drastically simplify our network, but what if we want to perform some translations on AVPs?

For starters, what is an AVP transformation? Well it’s simply rewriting the value of an AVP as the Diameter Request/Response passes through the DRA. A request may come into the DRA with IMSI xxxxxx and leave with IMSI yyyyyy if a translation is applied.

So why would we want to do this?

Well, what if we purchased another operator who used Realm X, and we use Realm Y, and we want to link the two networks, then we’d need to rewrite Realm Y to Realm X, and Realm X to Realm Y when they communicate, AVP transformations allow for this.

If we’re an MVNO with hosted IMSIs from an MNO, but want to keep just the one IMSI in our HSS/OCS, we can translate from the MNO hosted IMSI to our internal IMSI, using AVP transformations.

If our OCS supports only one rating group, and we want to rewrite all rating groups to that one value, AVP transformations cover this too.

There are lots of uses for this, and if you’ve worked with a bit of signaling before you’ll know that quite often these sorts of use-cases come up.

So how do we do this with freeDiameter?

To handle this I developed a module for passing each AVP to a Python function, which can then apply any transformation to a text based value, using every tool available to you in Python.

In the next post I’ll introduce rt_pyform and how we can use it with Python to translate Diameter AVPs.

Diameter Routing Agents (Why you need them, and how to build them) – Part 2 – Routing

What I typically refer to as Diameter interfaces / reference points, such as S6a, Sh, Sx, Sy, Gx, Gy, Zh, etc, etc, are also known as Applications.

Diameter Application Support

If you look inside the Capabilities Exchange Request / Answer dialog, what you’ll see is each side advertising the Applications (interfaces) that they support, each one being identified by an Application ID.

CER showing support for the 3GPP Zh Application-ID (Interface)

If two peers share a common Application-Id, then they can communicate using that Application / Interface.

For example, the above screenshot shows a peer with support for the Zh Interface (Spoiler alert, XCAP Gateway / BSF coming soon!). If two Diameter peers both have support for the Zh interface, then they can use that to send requests / responses to each other.

This is the basis of Diameter Routing.

Diameter Routing Tables

Like any router, our DRA needs to have logic to select which peer to route each message to.

For each Diameter connection to our DRA, it will build up a Diameter Routing table, with information on each peer, including the realm and applications it advertises support for.

Then, based on the logic defined in the DRA to select which Diameter peer to route each request to.

In its simplest form, Diameter routing is based on a few things:

  1. Look at the DestinationRealm, and see if we have any peers at that realm
  2. If we do then look at the DestinationHost, if that’s set, and the host is connected, and if it supports the specified Application-Id, then route it to that host
  3. If no DestinationHost is specified, look at the peers we have available and find the one that supports the specified Application-Id, then route it to that host
Simplified Diameter Routing Table used by DRAs

With this in mind, we can go back to looking at how our DRA may route a request from a connected MME towards an HSS.

Let’s look at some examples of this at play.

The request from MME02 is for DestinationRealm mnc001.mcc001.3gppnetwork.org, which our DRA knows it has 4 connected peers in (3 if we exclude the source of the request, as we don’t want to route it back to itself of course).

So we have 3 contenders still for who could get the request, but wait! We have a DestinationHost specified, so the DRA confirms the host is available, and that it supports the requested ApplicationId and routes it to HSS02.

So just because we are going through a DRA does not mean we can’t specific which destination host we need, just like we would if we had a direct link between each Diameter peer.

Conversely, if we sent another S6a request from MME01 but with no DestinationHost set, let’s see how that would look.

Again, the request is from MME02 is for DestinationRealm mnc001.mcc001.3gppnetwork.org, which our DRA knows it has 3 other peers it could route this to. But only two of those peers support the S6a Application, so the request would be split between the two peers evenly.

Clever Routing with DRAs

So with our DRA in place we can simplify the network, we don’t need to build peer links between every Diameter device to every other, but let’s look at some other ways DRAs can help us.

Load Control

We may want to always send requests to HSS01 and only use HSS02 if HSS01 is not available, we can do this with a DRA.

Or we may want to split load 75% on one HSS and 25% on the other.

Both are great use cases for a DRA.

Routing based on Username

We may want to route requests in the DRA based on other factors, such as the IMSI.

Our IMSIs may start with 001010001xxx, but if we introduced an MVNO with IMSIs starting with 001010002xxx, we’d need to know to route all traffic where the IMSI belongs to the home network to the home network HSS, and all the MVNO IMSI traffic to the MVNO’s HSS, and DRAs handle this.

Inter-Realm Routing

One of the main use cases you’ll see for DRAs is in Roaming scenarios.

For example, if we have a roaming agreement with a subscriber who’s IMSIs start with 90170, we can route all the traffic for their subs towards their HSS.

But wait, their Realm will be mnc901.mcc070.3gppnetwork.org, so in that scenario we’ll need to add a rule to route the request to a different realm.

DRAs handle this also.

In our next post we’ll start actually setting up a DRA with a default route table, and then look at some more advanced options for Diameter routing like we’ve just discussed.

One slight caveat, is that mutual support does not always mean what you may expect.
For example an MME and an HSS both support S6a, which is identified by Auth-Application-Id 16777251 (Vendor ID 10415), but one is a client and one is a server.
Keep this in mind!

Diameter Routing Agents (Why you need them, and how to build them) – Part 1

Answer Question 1: Because they make things simpler and more flexible for your Diameter traffic.
Answer Question 2: With free software of course!

All about DRAs

But let’s dive a little deeper. Let’s look at the connection between an MME and an HSS (the S6a interface).

Direct Diameter link between two Diameter Peers

We configure the Diameter peers on MME1 and HSS01 so they know about each other and how to communicate, the link comes up and presto, away we go.

But we’re building networks here! N+1 redundancy and all that, so now we have two HSSes and two MMEs.

Direct Diameter link between 4 Diameter peers

Okay, bit messy, but that’s okay…

But then our network grows to 10 MMEs, and 3 HSSes and you can probably see where this is going, but let’s drive the point home.

Direct Diameter connections for a network with 10x MME and 3x HSS

Now imagine once you’ve set all this up you need to do some maintenance work on HSS03, so need to shut down the Diameter peer on 10 different MMEs in order to isolate it and deisolate it.

The problem here is pretty evident, all those links are messy, cumbersome and they just don’t scale.

If you’re someone with a bit of networking experience (and let’s face it, you’re here after all), then you’re probably thinking “What if we just had a central system to route all the Diameter messages?”

An Agent that could Route Diameter, a Diameter Routing Agent perhaps…

By introducing a DRA we build Diameter peer links between each of our Diameter devices (MME / HSS, etc) and the DRA, rather than directly between each peer.

Then from the DRA we can route Diameter requests and responses between them.

Let’s go back to our 10x MME and 3x HSS network and see how it looks with a DRA instead.

So much cleaner!

Not only does this look better, but it makes our life operating the network a whole lot easier.

Each MME sends their S6a traffic to the DRA, which finds a healthy HSS from the 3 and sends the requests to it, and relays the responses as well.

We can do clever load balancing now as well.

Plus if a peer goes down, the DRA detects the failure and just routes to one of the others.

If we were to introduce a new HSS, we wouldn’t need to configure anything on the MMEs, just add HSS04 to the DRA and it’ll start getting traffic.

Plus from an operations standpoint, now if we want to to take an HSS offline for maintenance, we just shut down the link on the HSS and all HSS traffic will get routed to the other two HSS instances.

In our next post we’ll talk about the Routing part of the DRA, how the decisions are made and all the nuances, and then in the following post we’ll actually build a DRA and start routing some traffic around!

MMS Deep Dive – MM1 – Mobile Terminated MMS

In our last post we talked about sending an Multimedia Message, and in this post, we’re going to cover the process of receiving a Multimedia Message.

Carl Sagan once famously said “If you wish to make an apple pie from scratch, you must first invent the universe”, we don’t need to go that far back, but if you want to deliver an MMS to a subscriber, first you must deliver an SMS.

Wait, but we’re talking about MMS right? So why are we talking SMS?

Modern MMS transport relies on HTTP, which is client-server based, the phone / UE is the client, and the MMSc is the Server.

The problem with this client-server relationship, is the client requests things from the server, but the server can’t request things from the client.

This presents a problem when it comes to delivering the MMS – The phone / UE will need to request the MMSc provide it the message to be received, but needs to know there is a message to request in the first place.

So this is where SMS comes in. When the MMSc has a message destined for a Subscriber, it sends the phone/UE an SMS, informing that there is an MMS waiting, and providing the URL the MMS can be retrieved from.

This is typically done by MAP or SMPP, to link the MMSc to the SMSc to allow it to send these messages.

This SMS contains the URL to retrieve the MMS at, once the UE receives this SMS, it knows where to retrieve the MMS.

It can then send an HTTP GET to the URL to retrieve the MMS, and lastly sends an HTTP POST to confirm to the MMSc it retrieved it all OK.

MMS Mobile Terminated message flow

So that’s the basics, let’s look at each part of the dialog in some more detail, starting with this magic SMS to tell the UE where to retrieve the MMS from.

WAP PUSH from MMSc sent via SMS

So some things to notice, the user data, which would usually carry the body of our SMS instead contains another protocol, “Wireless Session Protocol” (WSP), and this is the method “Push”.

That in turn is followed by MMS Message Encapsulation, again inside the SMS message body, this time with the MMS specific data.

The From: header contains the sender of the MMS, this is how you can see who the MMS is from, while it’s still downloading.

The expiry indicates to the handset, it it doesn’t download the MMS within the specified time period, it shouldn’t bother, as the message will have expired.

And lastly, and perhaps most importantly, we have the X-MMS-Content-Location header, which tells our subscriber where to download the MMS from.

After this, the UE sends an HTTP GET to the URL in the X-MMS-Content-Location header (typically on the “mms” APN), to retrieve the MMS from the MMSc.

HTTP GET from the UE to the MMSc

The HTTP GET is pretty normal, there’s the usual MMS headers we talked about in the last post, and we just GET the path provided by the MMSc in the WAP PUSH.

The response from the MMSc contains the actual MMS itself, which is almost a mirror of the sending process (the Data component is unchanged from when the sender sent it).

Response to HTTP GET for message retrieval

At this stage our subscriber has retrieve the MMS, but may not have retrieved it fully, or may have had an issue retrieving it.

Instead the UE sends an HTTP POST with the MMS-Message-Type m-notifyresp-ind with the transaction ID, to indicate that it has successfully retrieved the MMS, and at this point the MMS can notify the sender if delivery receipts are enabled, and delete the message from the cache.

And finally the MMSc sends back a 200 OK with no body to confirm it got that too.

Some notes on MMS Security

Reading about unauthenticated GET requests, you may be left wondering what security does MMS have, and what stops you from just going through and sending HTTP GET requests to all the possible URL paths to vacuum up all the MMS?

In the standard, nothing!

Typically the MMSc has some layer of security added by the implementer, to ensure the user retrieving the MMS, is the user the MMS is destined for.
Because MMS has no security in the standard, this is typically achieved through Header Enrichment, whereby the P-GW adds a HTTP header with the MSISDN or IMSI of the subscriber, and then the MMSc can evaluate if this subscriber should be able to retrieve that URL.

Another attack vector I played with was sending a SMS based MMS-Notify with a different URL, which if retrieved, would leak the subscriber’s IP, as it would cause the UE to try and get data from that URL.

Kamailio I-CSCF – SRV Lookup Behaviour

Recently I had a strange issue I thought I’d share.

Using Kamailio as an Interrogating-CSCF, Kamailio was getting the S-CSCF details from the User-Authorization-Answer’s “Server-Name” (602) AVP.

The value was set to:

sip:scscf.mnc001.mcc001.3gppnetwork.org:5060

But the I-CSCF was only looking up A-Records for scscf.mnc001.mcc001.3gppnetwork.org, not using DNS-SRV.

The problem? The Server-Name I had configured as a full SIP URI in PyHSS including the port, meant that Kamailio only looks up the A-Record, and did not do a DNS-SRV lookup for the domain.

Dropping the port number saw all those delicious SRV records being queried.

Something to keep in mind if you use S-CSCF pooling with a Kamailio based I-CSCF, if you want to use SRV records for load balancing / traffic sharing, don’t include the port, and if instead you want it to go to the specified host found by an A-record, include the port.

MMS Deep Dive – MM1 – Mobile Originated MMS

So you want to send a Multimedia Message (Aka MMS or MM)?

Let’s do it – We’ll use the MM1 interface from the UE towards the MMSc (MMS Service Center) to send our Mobile Originated MMS.

Transport & Creation

Out of the box, our UE doesn’t get told by the network anything about where to send MMS messages (Unless set via something like Android’s Carrier Settings).
Instead, this is typically configured by the user in the APN settings, by setting the MMSc address (Typically an FQDN), port (Typically 80) and often a Proxy (Which will actually handle the traffic).
Lastly under the bearer type, if we’re sending the MMS on the default bearer (the one used for general Internet) then the bearer type will need to change from “default” to “default,mms”. Alternately, if you’re using a dedicated APN for MMS, you’ll need to set the bearer type to “mms”.

With the connectivity side setup, we’ll need to actually generate an MMS to send, something that is encapsulated in an MMS – so a picture is a good start.

We compose a message with this photo, put an address in the message and hit send on the UE.

The UE encapsulates the photo and metadata, such as the To address, into an HTTP POST is sent to the IP & Port of your MMSc (Or proxy if you have that set). The body of this HTTP POST contains the MMS Message Body (In this case our picture).

Our MMSc receives this POST, extracts the headers of interest, and the multimedia message body itself (in our case the photo) ready to be forwarded onto their destination.

PCAP Extract showing MMS m-send-req from UE

Header Enrichment / Charging / Authentication

One thing to note is that the From header is empty.

Often times a UE doesn’t know it’s own MSISDN. While there is an MSISDN EF on the SIM file system, often this is not updated with the correct MSISDN, as a customer may have ported over their number from a different carrier, or had a replacement SIM reissued. There’s also some problems in just trusting the From address set by a UE, without verifying it as anyone could change this.

The MMS standards evolved in parallel to the 3GPP specifications, but were historically specified by the Open Mobile Alliance. Because it is at arms length with 3GPP, SIM based authentication was not used on the MM1 interface from a UE to the MMSc.

In fact, there is no authentication on an MMS specified in the standard, meaning in theory, anyone could send one. To counter this, the P-GW or GGSN handling the subscriber traffic often enables “Header Enrichment” which when it detects traffic on the MMS APN, will add a Header to the Mobile Originated request with the IMSI or MSISDN of the subscriber sending it, which the MMSc can use to bill the customer.

m-send-req Request

Let’s take a closer look into the HTTP POST sent by the UE containing the message.

Firstly we have what looks like a pretty bulk-standard HTTP POST header, albeit with some custom headers prefixed with “X-” and the Content-Type is application/vnd.wap.mms-message.

But immediately after the HTTP header in the HTTP message body, we have the “MMS Message Encapsulation” header:

MMS Message Encapsulation Header from MO MMS

This header contains the Destination we set in the MMS when sending it, the request type (m-send-req) as well as the actual content itself (inside the Data section).

So why the double header? Why not just encapsulate the whole thing in the HTTP Post? When MMS was introduced, most phones didn’t have a HTTP stack baked into them like everything does now. Instead traffic would be going through a WAP Gateway.

When usage of WAP fell away, the standard moved to transport the same payload that was transfered over WAP, to instead be transferred over HTTP.

Inside the Data section we can see the MIME Type of the attachments themselves, in this case, it’s a photo of my desk:

With all this information, the MMSc analyses the headers and stores the message body ready for forwarding onto the recipient(/s).

m-send-conf Response

To confirm successful receipt, the MMSc sends back a 200 OK with a matching Transaction ID, so the UE knows the message was accepted.

I’ve attached the PCAP here to view / analyse.

ITU International Point Code Structure

I’ve recently been writing a lot about SS7 / Sigtran, and couldn’t fit this in anywhere, but figured it may be of use to someone…

In our 3-8-3 formated ITU International Point code, each of the parts have a unique meaning.

The 3 bits in the first section are called the Zone section. Being only 3 bits long it means we can only encode the numbers 0-7 on them, but ITU have broken the planet up into different “zones”, so the first part of our ITU International Point Code denotes which Zone the Point Code is in (as allocated by ITU).

The next 8 bits in the second section (Area section) are used to define the “Signaling Area Network Code” (SANC), which denotes which country a point code is located in. Values can range from 0-255 and many countries span multiple SANC zones, for example the USA has 58 SANC Zones.

Lastly we have the last 3 bits that make up the ID section, denoting a single unique point code, typically a carrier’s international gateway. It’s unique within a Zone & SANC, so combined with the Zone-SANC-ID makes it a unique address on the SS7 network. Being only 3 bits long means that we’ve only got 8 possible values, hence so many SANCs being used.

2Europe
3Greenland, North America, the Caribbean, and Mexico
4Middle East and Asia
5South Asia, Australia, and New Zealand
6Africa
7South America
ITU Point Code Zone World Map

Demystifying SS7 & Sigtran – Part 4 – Routing with Point Codes

This is part of a series of posts looking into SS7 and Sigtran networks. We cover some basic theory and then get into the weeds with GNS3 based labs where we will build real SS7/Sigtran based networks and use them to carry traffic.

Having a direct Linkset from every Point Code to every other Point Code in an SS7 network isn’t practical, we need to rely on routing, so in this post we’ll cover routing between Point Codes on our STPs.

Let’s start in the IP world, imagine a router with a routing table that looks something like this:

Simple IP Routing Table
192.168.0.0/24 out 192.168.0.1 (Directly Attached)
172.16.8.0/22 via 192.168.0.3 - Static Route - (Priority 100)
172.16.0.0/16 via 192.168.0.2 - Static Route - (Priority 50)
10.98.22.1/32 via 192.168.0.3 - Static Route - (Priority 50)

We have an implicit route for the network we’re directly attached to (192.168.0.0/24), and then a series of static routes we configure.
We’ve also got two routes to the 172.16.8.0/22 subnet, one is more specific with a higher priority (172.16.8.0/22 – Priority 100), while the other is less specific with a lower priority (172.16.0.0/16 – Priority 50). The higher priority route will take precedence.

This should look pretty familiar to you, but now we’re going to take a look at routing in SS7, and for that we’re going to be talking Variable Length Subnet Masking in detail you haven’t needed to think about since doing your CCNA years ago…

Why Masking is Important

A route to a single Point Code is called a “/14”, this is akin to a single IPv4 address being called a “/32”.

We could setup all our routing tables with static routes to each point code (/14), but with about 4,000 international point codes, this might be a challenge.

Instead, by using Masks, we can group together ranges of Point Codes and route those ranges through a particular STP.

This opens up the ability to achieve things like “Route all traffic to Point Codes to this Default Gateway STP”, or to say “Route all traffic to this region through this STP”.

Individually routing to a point code works well for small scale networking, but there’s power, flexibility and simplification that comes from grouping together ranges of point codes.

Information Overload about Point Codes

So far we’ve talked about point codes in the X.YYY.Z format, in our lab we setup point codes like 1.2.3.

This is not the only option however…

Variants of SS7 Point Codes

IPv4 addresses look the same regardless of where you are. From Algeria to Zimbabwe, IPv4 addresses look the same and route the same.

Standards
XKCD 927: Standards

In SS7 networks that’s not the case – There are a lot of variants that define how a point code is structured, how long it is, etc. Common variants are ANSI, ITU-T (International & National variants), ETSI, Japan NTT, TTC & China.

The SS7 variant used must match on both ends of a link; this means an SS7 node speaking ETSI flavoured Point Codes can’t exchange messages with an ANSI flavoured Point Code.

Well, you can kinda translate from one variant to another, but requires some rewriting not unlike how NAT does it.

ITU International Variant

For the start of this series, we’ll be working with the ITU International variant / flavour of Point Code.

ITU International point codes are 14 bits long, and format is described as 3-8-3.
The 3-8-3 form of Point code just means the 14 bit long point code is broken up into three sections, the first section is made up of the first 3 bits, the second section is made up of the next 8 bits then the remaining 3 bits in the last section, for a total of 14 bits.

So our 14 bit 3-8-3 Point Code looks like this in binary form:

000-00000000-000 (Binary) == 0-0-0 (Point Code)

So a point code of 1-2-3 would look like:

001-00000010-011 (Binary) == 1-2-3 (Point Code) [001 = 1, 00000010 = 2, 011 = 3]

This gives us the following maximum values for each part:

111-11111111-111 (Binary) == 7-255-7 (Point Code)

This is not the only way to represent point codes, if we were to take our binary string for 1-2-3 and remove the hyphens, we get 00100000010011. If you convert this binary string into an Integer/Decimal value, you’ll get 2067.

If you’re dealing with multiple vendors or products,you’ll see some SS7 Point Codes represented as decimal (2067), some showing as 1-2-3 codes and sometimes just raw binary.
Fun hey?

Handy point code formatting tool

Why we need to know about Binary and Point Codes

So why does the binary part matter? Well the answer is for masks.

To loop back to the start of this post, we talked about IP routing using a network address and netmask, to represent a range of IP addresses. We can do the same for SS7 Point Codes, but that requires a teeny bit of working out.

As an example let’s imagine we need to setup a route to all point codes from 3-4-0 through to 3-6-7, without specifying all the individual point codes between them.

Firstly let’s look at our start and end point codes in binary:

100-00000100-000 = 3-004-0 (Start Point Code)
100-00000110-111 = 3-006-7 (End Point Code)

Looking at the above example let’s look at how many bits are common between the two,

100-00000100-000 = 3-004-0 (Start Point Code)
100-00000110-111 = 3-006-7 (End Point Code)

The first 9 bits are common, it’s only the last 5 bits that change, so we can group all these together by saying we have a /9 mask.

When it comes time to add a route, we can add a route to 3-4-0/9 and that tells our STP to match everything from point code 3-4-0 through to point code 3-6-7.

The STP doing the routing it only needs to match on the first 9 bits in the point code, to match this route.

SS7 Routing Tables

Now we have covered Masking for roues, we can start putting some routes into our network.

In order to get a message from one point code to another point code, where there isn’t a direct linkset between the two, we need to rely on routing, which is performed by our STPs.

This is where all that point code mask stuff we just covered comes in.

Let’s look at a diagram below,

Let’s look at the routing to get a message from Exchange A (SSP) on the bottom left of the picture to Exchange E (SSP) with Point Code 4.5.3 in the bottom right of the picture.

Exchange A (SSP) on the bottom left of the picture has point code 1.2.3 assigned to it and a Linkset to STP-A.
It has the implicit route to STP-A as it’s got that linkset, but it’s also got a route configured on it to reach any other point code via the Linkset to STP-A via the 0.0.0/0 route which is the SS7 equivalent of a default route. This means any traffic to any point code will go to STP-A.

From STP-A we have a linkset to STP-B. In order to route to the point codes behind STP-B, STP-A has a route to match any Point Code starting with 4.5.X, which is 4.5.0/11.
This means that STP-A will route any Point Code between 4.5.1 and 4.5.7 down the Linkset to STP-B.

STP-B has got a direct connection to Exchange B and Exchange E, so has implicit routes to reach each of them.

So with that routing table, Exchange A should be able to route a message to Exchange E.

But…

Return Routing

Just like in IP routing, we need return routing. while Exchange A (SSP) at 1.2.3 has a route to everywhere in the network, the other parts of the network don’t have a route to get to it. This means a request from 1.2.3 can get anywhere in the network, but it can’t get a response back to 1.2.3.

So to get traffic back to Exchange A (SSP) at 1.2.3, our two Exchanges on the right (Exchange B & C with point codes 4.5.6 and 4.5.3) will need routes added to them. We’ll also need to add routes to STP-B, and once we’ve done that, we should be able to get from Exchange A to any point code in this network.

There is a route missing here, see if you can pick up what it is!

So we’ve added a default route via STP-B on Exchange B & Exchange E, and added a route on STP-B to send anything to 1.2.3/14 via STP-A, and with that we should be able to route from any exchange to any other exchange.

One last point on terminology – when we specify a route we don’t talk in terms of the next hop Point Code, but the Linkset to route it down. For example the default route on Exchange A is 0.0.0/0 via STP-A linkset (The linkset from Exchange A to STP-A), we don’t specify the point code of STP-A, but just the name of the Linkset between them.

Back into the Lab

So back to the lab, where we left it was with linksets between each point code, so each Country could talk to it’s neighbor.

Let’s confirm this is the case before we go setting up routes, then together, we’ll get a route from Country A to Country C (and back).

So let’s check the status of the link from Country B to its two neighbors – Country A and Country C. All going well it should look like this, and if it doesn’t, then stop by my last post and check you’ve got everything setup.

CountryB#show cs7 linkset 
lsn=ToCountryA          apc=1.2.3         state=avail     avail/links=1/1
  SLC  Interface                    Service   PeerState         Inhib
  00   10.0.5.1 1024 1024           avail     InService         -----

lsn=ToCountryC          apc=7.7.1         state=avail     avail/links=1/1
  SLC  Interface                    Service   PeerState         Inhib
  00   10.0.6.2 1025 1025           avail     InService         -----


So let’s add some routing so Country A can reach Country C via Country B. On Country A STP we’ll need to add a static route. For this example we’ll add a route to 7.7.1/14 (Just Country C).

That means Country A knows how to get to Country C. But with no return routing, Country C doesn’t know how to get to Country A. So let’s fix that.

We’ll add a static route to Country C to send everything via Country B.

CountryC#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
CountryC(config)#cs7 route-table system
CountryC(config)#update route 0.0.0/0 linkset ToCountryB
*Jan 01 05:37:28.879: %CS7MTP3-5-DESTSTATUS: Destination 0.0.0 is accessible

So now from Country C, let’s see if we can ping Country A (Ok, it’s not a “real” ICMP ping, it’s a link state check message, but the result is essentially the same).

By running:

CountryC# ping cs7 1.2.3
*Jan 01 06:28:53.699: %CS7PING-6-RTT: Test Q.755 1.2.3: MTP Traffic test rtt 48/48/48
*Jan 01 06:28:53.699: %CS7PING-6-STAT: Test Q.755 1.2.3: MTP Traffic test 100% successful packets(1/1)
*Jan 01 06:28:53.699: %CS7PING-6-RATES: Test Q.755 1.2.3: Receive rate(pps:kbps) 1:0 Sent rate(pps:kbps) 1:0
*Jan 01 06:28:53.699: %CS7PING-6-TERM: Test Q.755 1.2.3: MTP Traffic test terminated.

We can confirm now that Country C can reach Country A, we can do the same from Country A to confirm we can reach Country B.

But what about Country D? The route we added on Country A won’t cover Country D, and to get to Country D, again we go through Country B.

This means we could group Country C and Country D into one route entry on Country A that matches anything starting with 7-X-X,

For this we’d add a route on Country A, and then remove the original route;

CountryA(config)# cs7 route-table system
CountryA(config-cs7-rt)#update route 7.0.0/3 linkset ToCountryB
CountryA(config-cs7-rt)#no update route 7.7.1/14 linkset ToCountryB

Of course, you may have already picked up, we’ll need to add a return route to Country D, so that it has a default route pointing all traffic to STP-B. Once we’ve done that from Country A we should be able to reach all the other countries:

CountryA#show cs7 route 
Dynamic Routes 0 of 1000

Routing table = system Destinations = 3 Routes = 3

Destination            Prio Linkset Name        Route
---------------------- ---- ------------------- -------        
4.5.6/14         acces   1  ToCountryB          avail          
7.0.0/3          acces   5  ToCountryB          avail          


CountryA#ping cs7 7.8.1
*Jan 01 07:28:19.503: %CS7PING-6-RTT: Test Q.755 7.8.1: MTP Traffic test rtt 84/84/84
*Jan 01 07:28:19.503: %CS7PING-6-STAT: Test Q.755 7.8.1: MTP Traffic test 100% successful packets(1/1)
*Jan 01 07:28:19.503: %CS7PING-6-RATES: Test Q.755 7.8.1: Receive rate(pps:kbps) 1:0  Sent rate(pps:kbps) 1:0 
*Jan 01 07:28:19.507: %CS7PING-6-TERM: Test Q.755 7.8.1: MTP Traffic test terminated.
CountryA#ping cs7 7.7.1
*Jan 01 07:28:26.839: %CS7PING-6-RTT: Test Q.755 7.7.1: MTP Traffic test rtt 60/60/60
*Jan 01 07:28:26.839: %CS7PING-6-STAT: Test Q.755 7.7.1: MTP Traffic test 100% successful packets(1/1)
*Jan 01 07:28:26.839: %CS7PING-6-RATES: Test Q.755 7.7.1: Receive rate(pps:kbps) 1:0  Sent rate(pps:kbps) 1:0 
*Jan 01 07:28:26.843: %CS7PING-6-TERM: Test Q.755 7.7.1: MTP Traffic test terminated.

So where to from here?

Well, we now have a a functional SS7 network made up of STPs, with routing between them, but if we go back to our SS7 network overview diagram from before, you’ll notice there’s something missing from our lab network…

So far our network is made up only of STPs, that’s like building a network only out of routers!

In our next lab, we’ll start adding some SSPs to actually generate some SS7 traffic on the network, rather than just OAM traffic.

Demystifying SS7 & Sigtran – Part 3 – SS7 Lab in GNS3

This is part of a series of posts looking into SS7 and Sigtran networks. We cover some basic theory and then get into the weeds with GNS3 based labs where we will build real SS7/Sigtran based networks and use them to carry traffic.

So we’ve made it through the first two parts of this series talking about how it all works, but now dear reader, we build an SS7 Lab!

At one point, and SS7 Signaling Transfer Point would be made up of at least 3 full size racks, and cost $5M USD.
We can run a dozen of them inside GNS3!


This post won’t cover usage of GNS3 itself, there’s plenty of good documentation on using GNS3 if you need to get acquainted with it before we start.

Cisco’s “IP Transfer Point” (ITP) software adds SS7 STP functionality to some models of Cisco Router, like the 2651XM and C7200 series hardware.

Luckily for us, these hardware platforms can be emulated in GNS3, so that’s how we’ll be setting up our instances of Cisco’s ITP product to use as STPs in our network.

For the rest of this post series, I’ll refer to Cisco’s IP Transfer Point as the “Cisco STP”.

Not open source you say! Osmocom have OsmoSTP, which we’ll introduce in a future post, and elaborate on why later…

From inside GNS3, we’ll create a new template as per the Gif below.

You will need a copy of the software image to load in. If you’ve got software entitlements you should be able to download it, the filename of the image I’m using for the 7200 series is c7200-itpk9-mz.124-15.SW.bin and if you go searching, you should find it.

Now we can start building networks with our Cisco STPs!

What we’re going to achieve

In this lab we’re going to introduce the basics of setting up STPs using Sigtran (SS7 over IP).

If you follow along, by the end of this post you should have two STPs talking Sigtran based SS7 to each other, and be able to see the SS7 packets in Wireshark.

As we touched on in the last post, there’s a lot of different flavours and ways to implement SS7 over IP. For this post, we’re going to use M2PA (MTP2 Peer Adaptation Layer) to carry the MTP2 signaling, while MTP3 and higher will look the same as if it were on a TDM link. In a future post we’ll better detail the options here, the strengths and weaknesses of each method of transporting SS7 over IP, but that’s future us’ problem.

IP Connectivity

As we don’t have any TDM links, we’re going to do everything on IP, this means we have to setup the IP layer, before we can add any SS7/Sigtran stuff on top, so we’re going to need to get basic IP connectivity going between our Cisco STPs.

So for this we’ll need to set an IP Address on an interface, unshut it, link the two STPs. Once we’ve confirmed that we’ve got IP connectivity running between the two, we can get started on the Sigtran / SS7 side of things.

Let’s face it, if you’re reading this, I’m going to bet that you are probably aware of how to configure a router interface.

I’ve put a simple template down in the background to make a little more sense, which I’ve attached here if you want to follow along with the same addressing, etc.

So we’ll configure all the routers in each country with an IP – we don’t need to configure IP routing. This means adjacent countries with a direct connection between them should be able to ping each other, but separated countries shouldn’t be able to.

So now we’ve got IP connectivity between two countries, let’s get Sigtran / SS7 setup!

First we’ll need to define the basics, from configure-terminal in each of the Cisco STPs. We’ll need to set the SS7 variant (We’ll use ITU variant as we’re simulating international links), the network-indicator (This is an International network, so we’ll use that) and the point code for this STP (From the background image).

CountryA(config)#cs7 variant itu 
CountryA(config)#cs7 network-indicator international 
CountryA(config)#cs7 point-code 1.2.3

Repeat this step on Country A and Country B.

Next we’ll define a local peer on the STP. This is an instance of the Sigtran stack along with the port we’ll be listening on. Our remote peer will need to know this value to bring up the connection, the number specified is the port, and the IP is the IP it will bind on.

CountryA(config)#cs7 local-peer 1024
CountryA(config-cs7-lp)#local-ip 10.0.5.1

If we had multiple layer 3 IP Interfaces connecting Country A & Country B, we could list all the IP Addresses here for SCTP Multihoming.

Lastly on Country A we’ll need to define our Linkset to connect to our peer.

CountryA(config)#cs7 linkset ToCountryB 4.5.6
CountryA(config-cs7-ls)#link 0 sctp 10.0.5.2 1024 1024

Where the first 1024 is the local-peer port we configured earlier, and the second 1024 is the remote peer port we’re about to configure on Country B.

If we stop at this point and sniff the traffic from Country A to Country B, we’ll see SCTP INITs from Country A to Country B, as it tries to bring up the SCTP connection for our SS7 traffic, and the SCTP connection gets rejected by Country B.

This is of course, because we’ve only configured Country A at this stage, so let’s fix this by configuring Country B.

On CountryB, again we’ll set the basic parameters, our local-peer settings and the Linkset to bring up,

CountryB(config)#cs7 variant itu 
CountryB(config)#cs7 network-indicator international 
CountryB(config)#cs7 point-code 4.5.6
CountryB(config)#cs7 local-peer 1024
CountryB(config-cs7-lp)#local-ip 10.0.5.2
CountryB(config-cs7-lp)#exit
CountryB(config)#cs7 linkset ToCountryA 1.2.3
CountryB(config-cs7-ls)#link 0 sctp 10.0.5.1 1024 1024

If you’re still sniffing the traffic between Country A and Country B, you should see our SS7 connection come up.

Wireshark trace of the connection coming up

The conneciton will come up layer-by-layer, firstly you’ll see the transport layer (SCTP) bring up an SCTP association, then MTP2 Peer Adaptation Layer (M2PA) will negotiate up to confirm both ends are working, then finally you’ll see MTP3 messaging.

If we open up an MTP3 packet you can see our Originating and Destination Point Codes.

Notice in Wireshark the Point Codes don’t show up as 1-2-3, but rather 2067? That’s because they’re formatted as Decimal rather than 14 bit, this handy converter will translate them for you, or you can just change your preference in Wireshark’s decoders to use the matching ITU POint Code Structure.

From the CLI on one of the two country STPs we can run some basic commands to view the status of all SS7 components and Linksets.

And there you have it! Basic SS7 connectivity!

There is so much more to learn, and so much more to do!
By bringing up the link we’ve barely scratched the surface here.

Some homework before the next post, link all the other countries shown together, with Country D having a link to Country C and Country B. That’s where we’ll start in the lab – Tip: You’ll find you’ll need to configure a new cs7 local-peer for each interface, as each has its own IP.

Demystifying SS7 & Sigtran (With Labs!) – Part 2 – Ingredients Needed

This is part of a series of posts looking into SS7 and Sigtran networks. We cover some basic theory and then get into the weeds with GNS3 based labs where we will build real SS7/Sigtran based networks and use them to carry traffic.

So one more step before we actually start bringing up SS7 / Sigtran networks, and that’s to get a bit of a closer look at what components make up SS7 networks.

Recap: What is SS7?

SS7 is the name given to the protocol stack used almost exclusively in the telecommunications space. SS7 isn’t just one protocol, instead it is a suite of protocols.
In the same way when someone talks about IP networking, they’re typically not just talking about the IP layer, but the whole stack from transport to application, when we talk about an SS7 network, we’re talking about the whole stack used to carry messages over SS7.

And what is SIGTRAN?

Sigtran is “Signaling Transport”. Historically SS7 was carried over TDM links (Like E1 lines).

As the internet took hold, the “Signaling Transport” working group was formed to put together the standards for carrying SS7 over IP, and the name stuck.

I’ve always thought if I were to become a Mexican Wrestler (which is quite unlikely), my stage name would be DSLAM, but SIGTRAN comes a close second.

Today when people talk about SIGTRAN, they mean “SS7 over IP”.

What is in an SS7 Network?

SS7 Networks only have 3 types of network elements:

  • Service Switching Points (SSP)
  • Service Transfer Points (STP)
  • Service Control Points (SCP)

Service Switching Points (SSP)

Service Switching Points (SSPs) are endpoints in the network.
They’re the users of the connectivity, they use it to create and send meaningful messages over the SS7 network, and receive and process messages over the SS7 network.

Like a PC or server are IP endpoints on an IP Network, which send and receive messages over the network, an SSP uses the SS7 network to send and receive messages.

In a PSTN context, your local telephone exchange is most likely an SS7 Service Switching Point (SSP) as it creates traffic on the SS7 network and receives traffic from it.

A call from a user on one exchange to a user on another exchange could go from the SSP in Exchange A, to the SSP in Exchange B, in the same way you could send data between two computers by connecting directly between them with an Ethernet crossover cable.

Messages between our two exchanges are addressed using Point Codes, which can be thought of a lot like IP Addresses, except shorter.

In the MTP3 header of each SS7 message is the Destination Point Code, and the Origin Point Code.

When Telephone Exchange A wants to send a message over SS7 to Telephone Exchange B, the MTP header would look like:

MTP3 Header:
Origin Point Code:      1.2.3
Destination Point Code: 4.5.6

Service Transfer Points (STP)

Linking each SSP to each other SSP has a pretty obvious problem as our network grows.

What happens if we’ve got hundreds of SSPs? If we want a full-mesh topology connecting every SSP to every other SSP directly, we’d have a rats nest of links!

A “full-mesh” approach for connecting SSPs does not work at scale, so STPs are introduced

So to keep things clean and scalable, we’ve got Signalling Transfer Points (STPs).

STPs can be thought of like Routers but in an SS7 network.

When our SSP generates an SS7 message, it’s typically handed to an STP which looks at the Destination Point Code, it’s own routing table and routes it off to where it needs to go.

STP acting as a central router to connect lots of SSPs

This means every SSP doesn’t require a connection to every other SSP. Instead by using STPs we can cut down on the complexity of our network.

When Telephone Exchange A wants to send a message over SS7 to Telephone Exchange B, the MTP header would look the same, but the routing table on Telephone Exchange A would be setup to send the requests out the link towards the STP.

MTP3 Header:
Origin Point Code:      1.2.3
Destination Point Code: 4.5.6

Linksets

Between SS7 Nodes we have Linksets. Think of Linksets as like LACP or Etherchannel, but for SS7.

You want to have multiple links on every connection, for sharing out the load or for redundancy, and a Linkset is a group of connections from one SS7 node to another, that are logically treated as one link.

Link between an SSP and STP with 3 linksets

Each of the links in a Linkset is identified by a number, and specified in in the MTP3 header’s “Signaling Link Selector” field, so we know what link each message used.

MTP3 Header:
Origin Point Code:       1.2.3
Destination Point Code:  4.5.6
Signaling Link Selector: 2

Service Control Point (SCP)

Somewhere between a Rolodex an relational database, is the Service Control Point (SCP).

For an exchange (SSP) to route a call to another exchange, it has to know the point code of the destination Exchange to send the call to.
When fixed line networks were first deployed this was fairly straight forward, each exchange had a list of telephone number prefixes and the point code that served each prefix, simple.

But then services like number porting came along when a number could be moved anywhere.
Then 1800/0800 numbers where a number had to be translated back to a standard phone number entered the picture.

To deal with this we need a database, somewhere an SSP can go to query some information in a database and get a response back.

This is where we use the Service Control Point (SCP).

Keep in mind that SS7 long predates APIs to easily lookup data from a service, so there was no RESTful option available in the 1980s.

When a caller on a local exchange calls a toll free (1800 or 0800 number depending on where you are) number, the exchange is setup with the Point Code of an SCP to query with the toll free number, and the SCP responds back with the local number to route the call to.

While SCPs are fading away in favor of technology like DNS/ENUM for Local Number Portability or Routing Databases, but they are still widely used in some networks.

Getting to know the Signalling Transfer Point (STP)

As we saw earlier, instead of a one-to-one connection between each SS7 device to every other SS7 device, Signaling Transfer Points (STP) are used, which act like routers for our SS7 traffic.

The STP has an internal routing table made up of the Point Codes it has connections to and some logic to know how to get to each of them.

Like a router, STPs don’t really create SS7 traffic, or consume traffic, they just receive SS7 messages and route them on towards their destination.

Ok, they do create some traffic for checking links are up, etc, but like a router, their main job is getting traffic where it needs to go.

When an STP receives an SS7 message, the STP looks at the MTP3 header. Specifically the Destination Point Code, and finds if it has a path to that Point Code. If it has a route, it forwards the SS7 message on to the next hop.

Like a router, an STP doesn’t really concern itself with anything higher than the MTP3 layer – As point codes are set in the MTP3 layer that’s the only layer the STP looks at and the upper layers aren’t really “any of its business”.

STPs don’t require a direct connection (Linkset) from the Originating Point Code straight to the Destination Point Code. Just like every IP router doesn’t need a direct connection to ever other network.
By setting up a routing table of Point Codes and Linksets as the “next-hop”, we can reach Destination Point Codes we don’t have a direct Linkset to by routing between STPs to reach the final Destination Point Code.

Let’s work through an example:

And let’s look at the routing table setup on STP-A:

STP A Routing Table:
1.2.3 - Directly attached (Telephone Exchange A)
1.2.4 - Directly attached (Telephone Exchange C)
1.2.5 - Directly attached (Telephone Exchange D)
4.5.1 - Directly attached (STP-B)
4.5.3 - Via STP-B
4.5.6 - Via STP-B

So what happens when Telephone Exchange A (Point Code 1.2.3) wants to send a message to Telephone Exchange E (Point Code 4.5.3)?
Firstly Telephone Exchange A puts it’s message on an MTP3 payload, and the MTP3 header will look something like this:

MTP3 Header:
Origin Point Code:       1.2.3
Destination Point Code:  4.5.3
Signaling Link Selector: 1

Telephone Exchange A sends the SS7 message to STP A, which looks at the MTP3 header’s Destination Point Code (4.5.3), and then in it’s routing table for a route to the destination point. We can see from our example routing table that STP A has a route to Destination Point Code 4.5.3 via STP-B, so sends it onto STP-B.

For STP-B it has a direct connection (linkset) to Telephone Exchange E (Point Code 4.5.3), so sends it straight on

Like IP, Point Codes have their own form of Variable-Length-Subnet-Routing which means each STP doesn’t need full routing info for every Destination Point Code, but instead can have routes based on part of the point code and a subnet mask.

But unlike IP, there is no BGP or OSPF on SS7 networks. Instead, all routes have to be manually specified.

For STP A to know it can get messages to destinations starting with 4.5.x via STP B, it needs to have this information manually added to it’s route table, and the same for the return routing.

Sigtran & SS7 Over IP

As the world moved towards IP enabled everything, TDM based Sigtran Networks became increasingly expensive to maintain and operate, so a IETF taskforce called SIGTRAN (Signaling Transport) was created to look at ways to move SS7 traffic to IP.

When moving SS7 onto IP, the first layer of SS7 (MTP1) was dropped, as it primarily concerned the physical side of the network. MTP2 didn’t really fit onto an IP model, so a two options were introduced for transport of the MTP2 data, M2PA (Message Transfer Part 2 User Peer-to-Peer Adaptation Layer) and M2UA (MTP2 User Adaptation Layer) were introduced, which rides on top of SCTP.
This means if you wanted an MTP2 layer over IP, you could use M2UA or M2TP.

SCTP is neither TCP or UDP. I’ve touched upon SCTP on this blog before, it’s as if you took the best bits of TCP without the issues like head of line blocking and added multi-homing of connections.

So if you thought all the layers above MTP2 are just transferred, unchanged on top of our M2PA layer, that’s one way of doing it, however it’s not the only way of doing it.

There are quite a few ways to map SS7 onto IP Networks, which we’ll start to look into it more detail, but to keep it simple, for the next few posts we’ll be assuming that everything above MTP2/M2PA remain unchanged.

In the next post, we’ll get some actual SS7 traffic flowing!

Demystifying SS7 & Sigtran Networks (With Labs!) – Part 1 – Intro

This is part of a series of posts looking into SS7 and Sigtran networks. We cover some basic theory and then get into the weeds with GNS3 based labs where we will build real SS7/Sigtran based networks and use them to carry traffic.

If you use a mobile phone, a VoIP system or a copper POTS line, there’s a high chance that somewhere in the background, SS7 based signaling is being used.

The signaling for GSM, UMTS and WCDMA mobile networks all rely on SS7 based signaling, and even today the backbone of most PSTN traffic relies SS7 networks. To many this is mysterious carrier tech, and as such doesn’t get much attention, but throughout this series of posts we’ll take a hands-on approach to putting together an SS7 network using GNS3 based labs and connect devices through SS7 and make some stuff happen.

Overview of SS7

Signaling System No. 7 (SS7/C7) is the name for a family of protocols originally designed for signaling between telephone switches. In plain English, this means it was used to setup and teardown large volumes of calls, between exchanges or carriers.

When carrier A and Carrier B want to send calls between each other, there’s a good chance they’re doing it over an SS7 Network.

But wait! SIP exists and is very popular, why doesn’t everyone just use SIP?
Good question, imaginary asker. The answer is that when SS7 came along, SIP was still almost 25 years away from being defined.
Yes. It’s pretty old.

SS7 isn’t one protocol, but a family of protocols that all work together – A “protocol stack”.
The SS7 specs define the lower layers and a choice of upper layer / application protocols that can be carried by them.

The layered architecture means that the application layer at the top can be changed, while the underlying layers are essentially the same.

This means while SS7’s original use was for setting up and tearing down phone calls, this is only one application for SS7 based networks. Today SS7 is used heavily in 2G/3G mobile networks for connectivity between core network elements in the circuit-switched domain, for international roaming between carriers and services like Local Number Portability and Toll Free numbers.

Here’s the layers of SS7 loosely mapped onto the OSI model (SS7 predates the OSI model as well):

OSI Model (Left) and SS7 Protocol Stack (Right)

We do have a few layers to play with here, and we’ll get into them all in depth as we go along, but a brief introduction to the underlying layers:

MTP 1 – Message Transfer Part 1

This is our physical layer. In this past this was commonly E1/T1 lines.

It’s responsible for getting our 1s and 0s from one place to another.

MTP 2 – Message Transfer Part 2

MTP2 is responsible for the data link layer, handling reliable transfer of data, in sequence.

MTP 3 – Message Transfer Part 3

The MTP3 header contains an Originating and a Destination Point Code.

These point codes can be thought of as like an IP Address; they’re used to address the source and destination of a message. A “Point Code” is the unique address of a SS7 Network element.

MTP3 header showing the Destination Point Code (DPC) and Origin Point Code (OPC) on a National Network, carrying ISUP traffic

Every message sent over an SS7 network will contain an Origin Point Code that identifies the sender, and a Destination Point Code that identifies the intended recipient.

This is where we’ll bash around at the start of this course, setting up Linksets to allow different devices talking to each other and addressing each other via Point Codes.

The MTP3 header also has a Service Indicator flag that indicates what the upper layer protocol it is carrying is, like the Protocol indicator in IPv4/IPv6 headers.

A Signaling Link Selector indicates which link it was transported over (did I mention we can join multiple links together?), and a Network Indicator for determining if this is signaling is at the National or International level.

TUP/MAP/SCCP/ISUP

These are the “higher-layer” protocols. Like FTP sits on top of TCP/IP, a SS7 network can transport these protocols from their source to their destination, as identified by the Origin Point Code (OPC), to the Destination Point Code (DPC), as specified in the MTP3 header.

We’ll touch on these protocols more as we go on. SCCP has it’s own addressing on top of the OPC/DPC (Like IP has IP Addressing, but TCP has port numbers on top to further differentiate).

Why learn SS7 today?

SS7 and SIGTRAN are still widely in use in the telco world, some of it directly, other parts derived / evolved from it.

So stick around, things are about to get interesting!

Some thoughts on NRF Security in 5G Core

So I’ve been waxing lyrical about how cool in the NRF is, but what about how it’s secured?

A matchmaking service for service-consuming NFs to find service-producing NFs makes integration between them a doddle, but also opens up all sorts of attack vectors.

Theoretical Nasty Attacks (PoC or GTFO)

Sniffing Signaling Traffic:
A malicious actor could register a fake UDR service with a higher priority with the NRF. This would mean UDR service consumers (Like the AUSF or UDM) would send everything to our fake UDR, which could then proxy all the requests to the real UDR which has a lower priority, all while sniffing all the traffic.

Stealing SIM Credentials:
Brute forcing the SUPI/IMSI range on a UDR would allow the SIM Card Crypto values (K/OP/Private Keys) to be extracted.

Sniffing User Traffic:
A dodgy SMF could select an attacker-controlled / run UPF to sniff all the user traffic that flows through it.

Obviously there’s a lot more scope for attack by putting nefarious data into the NRF, or querying it for data gathering, and I’ll see if I can put together some examples in the future, but you get the idea of the mischief that could be managed through the NRF.

This means it’s pretty important to secure it.

OAuth2

3GPP selected to use common industry standards for HTTP Auth, including OAuth2 (Clearly lessons were learned from COMP128 all those years ago), however OAuth2 is optional, and not integrated as you might expect. There’s a little bit to it, but you can expect to see a post on the topic in the next few weeks.

3GPP Security Recommendations

So how do we secure the NRF from bad actors?

Well, there’s 3 options according to 3GPP:

Option 1 – Mutual TLS

Where the Client (NF) and the Server (NRF) share the same TLS info to communicate.

This is a pretty standard mechanism to use for securing communications, but the reliance on issuing certificates and distributing them is often done poorly and there is no way to ensure the person with the certificate, is the person the certificate was issued to.

3GPP have not specified a mechanism for issuing and securely distributing certificates to NFs.

Option 2 – Network Domain Security (NDS)

Split the network traffic on a logical level (VLANs / VRFs, etc) so only NFs can access the NRF.

Essentially it’s logical network segregation.

Option 3 – Physical Security

Split the network like in NDS but a physical layer, so the physical cables essentially run point-to-point from NF to NRF.

Thoughts?

What’s interesting is these are presented as 3 options, rather than the layered approach.

OAuth2 is used, but

Summary


NRF and NF shall authenticate each other during discovery, registration, and access token request. If the PLMN uses
protection at the transport layer as described in clause 13.1, authentication provided by the transport layer protection
solution shall be used for mutual authentication of the NRF and NF.
If the PLMN does not use protection at the transport layer, mutual authentication of NRF and NF may be implicit by
NDS/IP or physical security (see clause 13.1).
When NRF receives message from unauthenticated NF, NRF shall support error handling, and may send back an error
message. The same procedure shall be applied vice versa.
After successful authentication between NRF and NF, the NRF shall decide whether the NF is authorized to perform
discovery and registration.
In the non-roaming scenario, the NRF authorizes the Nnrf_NFDiscovery_Request based on the profile of the expected
NF/NF service and the type of the NF service consumer, as described in clause 4.17.4 of TS23.502 [8].In the roaming
scenario, the NRF of the NF Service Provider shall authorize the Nnrf_NFDiscovery_Request based on the profile of
the expected NF/NF Service, the type of the NF service consumer and the serving network ID.
If the NRF finds NF service consumer is not allowed to discover the expected NF instances(s) as described in clause
4.17.4 of TS 23.502[8], NRF shall support error handling, and may send back an error message.
NOTE 1: When a NF accesses any services (i.e. register, discover or request access token) provided by the NRF ,
the OAuth 2.0 access token for authorization between the NF and the NRF is not needed.

TS 133 501 – 13.3.1 Authentication and authorization between network functions and the NRF

If you like Pina Coladas, and service the control plane – Intro to NRF in 5GC

The Network Repository Function plays matchmaker to all the elements in our 5G Core.

For our 5G Service-Based-Architecture (SBA) we use Service Based Interfaces (SBIs) to communicate between Network Functions. Sometimes a Network Function acts as a server for these interfaces (aka “Service Producer”) and sometimes it acts as a client on these interfaces (aka “Service Consumer”).

For service consumers to be able to find service producers (Clients to be able to find servers), we need a directory mechanism for clients to be able to find the servers to serve their needs, this is the role of the NRF.

With every Service Producer registering to the NRF, the NRF has knowledge of all the available Service Producers in the network, so when a Service Consumer NF comes along (Like an AMF looking for UDM), it just queries the NRF to get the details of who can serve it.

Basic Process – NRF Registration

In order to be found, a service producer NF has to register with the NRF, so the NRF has enough info on the service-producer to be able to recommend it to service-consumers.

This is all the basic info, the Service Based Interfaces (SBIs) that this NF serves, the PLMN, and the type of NF.

The NRF then stores this information in a database, ready to be found by SBI Service Consumers.

This is achieved by the Service Producing NF sending a HTTP2 PUT to the NRF, with the message body containing all the particulars about the services it offers.

Simplified example of an SMSc registering with the NRF in a 5G Core

Basic Process – NRF Discovery

With an NRF that has a few SBI Service Producers registered in it, we can now start querying it from SBI Service Consumers, to find SBI Service Producers.

The SBI Service Consumer looking for a SBI Service Producer, queries the NRF with a little information about itself, and the SBI Service Producer it’s looking for.

For example a SMF looking for a UDM, sends a request like:

http://[::1]:7777/nnrf-disc/v1/nf-instances?requester-nf-type=SMF&target-nf-type=UDM

To the NRF, and the NRF responds with SBI Service Producing NFs that match in JSON body of the response.

SMSF being found by the AMF using the NRF

More Info

I’ve written in a more technical detail on the NRF in this post, you can learn about setting up Open5Gs NRF in this post, and keep tuned for a lot more content on 5GC!

5G Online Charging with the Nchf_ConvergedCharging SBI

There’s no such thing as a free lunch, and 5G is the same – services running through a 5G Standalone core need to be billed.

In 5G Core Networks, the SMF (Session Management Function) reaches out to the CHF (Charging Function) to perform online charging, via the Nchf_ConvergedCharging Service Based Interface (aka reference point).

Like in other generations of core mobile networks, Credit Control in 5G networks is based on 3 functions:
Requesting a quota for a subscriber from an online charging service, which if granted permits the subscriber to use a certain number of units (in this case data transferred in/out).
Just before those units are exhausted sending an update to request more units from the online charging service to allow the service to continue.
When the session has ended or or subscriber has disconnected, a termination to inform the online charging service to stop billing and refund any unused credit / units (data).

Initial Service Creation (ConvergedCharging_Create)

When the SMF needs to setup a session, (For example when the AMF sends the SMF a Nsmf_PDU_SessionCreate request), the CTF (Charging Trigger Function) built into the SMF sends a Nchf_ ConvergedCharging_Create (Initial, Quota Requested) to the Charging Function (CHF).

Because the Nchf_ConvergedCharging interface is a Service Based Interface this is carried over HTTP, in practice, this means the SMF sends a HTTP post to http://yourchargingfunction/Nchf_ConvergedCharging/v1/chargingdata/

Obviously there’s some additional information to be shared rather than just a HTTP post, so the HTTP post includes the ChargingDataRequest as the Request Body. If you’ve dealt with Diameter Credit Control you may be expecting the ChargingDataRequest information to be a huge jumble of nested AVPs, but it’s actually a fairly short list:

  • The subscriberIdentifier (SUPI) is included to identify the subscriber so the CHF knows which subscriber to charge
  • The nfConsumerIdentification identifies the SMF generating the request (The SBI Consumer)
  • The invocationTimeStamp and invocationSequenceNumber are both pretty self explanatory; the time the request is sent and the sequence number from the SBI consumer
  • The notifyUri identifies which URI should receive subsequent notifications from the CHF (For example if the CHF wants to terminate the session, the SMF to send that to)
  • The multipleUnitUsage defines the service-specific parameters for the quota being requested.
  • The triggers identifies the events that trigger the request

Of those each of the fields should be pretty self explanatory as to their purpose.
The multipleUnitUsage data is used like the Service Information AVP in Diameter based Credit Control, in that it defines the specifics of the service we’re requesting a quota for. Inside it contains a mandatory ratingGroup specifying which rating group the CHF should use, and optionally requestedUnit which can define either the amount of service units being requested (For us this is data in/out), or to tell the CHF units are needed. Typically this is used to define the amount of units to be requested.

On the amount of units requested we have a bit of a chicken-and-egg scenario; we don’t know how many units (In our case the units is transferred data in/out) to request, if we request too much we’ll take up all the customer’s credit, potentially prohibiting them from accessing other services, and not enough requested and we’ll constantly slam the CHF with requests for more credit.
In practice this value is somewhere between the two, and will vary quite a bit.

Based on the service details the SMF has put in the Nchf_ ConvergedCharging_Create request, the Charging Function (CHF) takes into account the subscriber’s current balance, credit control policies, etc, and uses this to determine if the Subscriber has the required balances to be granted a service, and if so, sends back a 201 CREATED response back to the Nchf_ConvergedCharging_Create request sent by the CTF inside the SMF.

This 201 CREATED response is again fairly clean and simple, the key information is in the multipleQuotaInformation which is nested within the ChargingDataResponse, which contains the finalUnitIndication defining the maximum units to be granted for the session, and the triggers to define when to check in with CHF again, for time, volume and quota thresholds.

And with that, the service is granted, the SMF can instruct the UPF to start allowing traffic through.

Update (ConvergedCharging_Update)

Once the granted units / quota has been exhausted, the Update (ConvergedCharging_Update) request is used for requesting subsequent usage / quota units. For example our Subscriber has used up all the data initially allocated but is still consuming data, so the SMF sends a Nchf_ConvergedCharging_Update request to request more units, via another HTTP post, to the CHF, with the requested service unit in the request body in the form of ChargingDataRequest as we saw in the initial ConvergedCharging_Create.

If the subscriber still has credit and the CHF is OK to allow their service to continue, the CHF returns a 200 OK with the ChargingDataResponse, again, detailing the units to be granted.

This procedure repeats over and over as the subscriber uses their allocated units.

Release (ConvergedCharging_Release)

Eventually when our subscriber disconnects, the SMF will generate a Nchf_ConvergedCharging_Release request, detailing the data the subscriber used in the ChargingDataRequest in the body, to the CHF, so it can refund any unused credits.

The CHF sends back a 204 No Content response, and the procedure is completed.

More Info

If you’ve had experience in Diameter credit control, this simple procedure will be a breath of fresh air, it’s clean and easy to comprehend,
If you’d like to learn more the 3GPP specification docs on the topic are clear and comprehensible, I’d suggest:

  • TS 132 290 – Short overview of charging mechanisms
  • TS 132 291 – Specifics of the Nchf_ConvergedCharging interface
  • The common 3GPP charging architecture is specified in TS 32.240
  • TS 132 291 – Overview of components and SBIs inc Operations
Credit Control Request / Answer call flow in IMS Charging

Basics of EPC/LTE Online Charging (OCS)

Early on as subscriber trunk dialing and automated time-based charging was introduced to phone networks, engineers were faced with a problem from Payphones.

Previously a call had been a fixed price, once the caller put in their coins, if they put in enough coins, they could dial and stay on the line as long as they wanted.

But as the length of calls began to be metered, it means if I put $3 of coins into the payphone, and make a call to a destination that costs $1 per minute, then I should only be allowed to have a 3 minute long phone call, and the call should be cutoff before the 4th minute, as I would have used all my available credit.

Conversely if I put $3 into the Payphone and only call a $1 per minute destination for 2 minutes, I should get $1 refunded at the end of my call.

We see the exact same problem with prepaid subscribers on IMS Networks, and it’s solved in much the same way.

In LTE/EPC Networks, Diameter is used for all our credit control, with all online charging based on the Ro interface. So let’s take a look at how this works and what goes on.

Generic 3GPP Online Charging Architecture

3GPP defines a generic 3GPP Online charging architecture, that’s used by IMS for Credit Control of prepaid subscribers, but also for prepaid metering of data usage, other volume based flows, as well as event-based charging like SMS and MMS.

Network functions that handle chargeable services (like the data transferred through a P-GW or calls through a S-CSCF) contain a Charging Trigger Function (CTF) (While reading the specifications, you may be left thinking that the Charging Trigger Function is a separate entity, but more often than not, the CTF is built into the network element as an interface).

The CTF is a Diameter application that generates requests to the Online Charging Function (OCF) to be granted resources for the session / call / data flow, the subscriber wants to use, prior to granting them the service.

So network elements that need to charge for services in realtime contain a Charging Trigger Function (CTF) which in turn talks to an Online Charging Function (OCF) which typically is part of an Online Charging System (AKA OCS).

For example when a subscriber turns on their phone and a GTP session is setup on the P-GW/PCEF, but before data is allowed to flow through it, a Diameter “Credit Control Request” is generated by the Charging Trigger Function (CTF) in the P-GW/PCEF, which is sent to our Online Charging Server (OCS).

The “Credit Control Answer” back from the OCS indicates the subscriber has the balance needed to use data services, and specifies how much data up and down the subscriber has been granted to use.

The P-GW/PCEF grants service to the subscriber for the specified amount of units, and the subscriber can start using data.

This is a simplified example – Decentralized vs Centralized Rating and Unit Determination enter into this, session reservation, etc.

The interface between our Charging Trigger Functions (CTF) and the Online Charging Functions (OCF), is the Ro interface, which is a Diameter based interface, and is common not just for online charging for data usage, IMS Credit Control, MMS, value added services, etc.

3GPP define a reference online-charging interface, the Ro interface, and all the application-specific interfaces, like the Gy for billing data usage, build on top of the Ro interface spec.

Basic Credit Control Request / Credit Control Answer Process

This example will look at a VoLTE call over IMS.

When a subscriber sends an INVITE, the Charging Trigger Function baked in our S-CSCF sends a Diameter “Credit Control Request” (CCR) to our Online Charging Function, with the type INITIAL, meaning this is the first CCR for this session.

The CCR contains the Service Information AVP. It’s this little AVP that is where the majority of the magic happens, as it defines what the service the subscriber is requesting. The main difference between the multitude of online charging interfaces in EPC networks, is just what the service the customer is requesting, and the specifics of that service.

For this example it’s a voice call, so this Service Information AVP contains a “IMS-Information” AVP. This AVP defines all the parameters for a IMS phone call to be online charged, for a voice call, this is the called-party, calling party, SDP (for differentiating between voice / video, etc.).

It’s the contents of this Service Information AVP the OCS uses to make decision on if service should be granted or not, and how many service units to be granted. (If Centralized Rating and Unit Determination is used, we’ll cover that in another post)
The actual logic, relating to this decision is typically based on the the rating and tariffing, credit control profiles, etc, and is outside the scope of the interface, but in short, the OCS will make a yes/no decision about if the subscriber should be granted access to the particular service, and if yes, then how many minutes / Bytes / Events should be granted.

In the received Credit Control Answer is received back from our OCS, and the Granted-Service-Unit AVP is analysed by the S-CSCF.
For a voice call, the service units will be time. This tells the S-CSCF how long the call can go on before the S-CSCF will need to send another Credit Control Request, for the purposes of this example we’ll imagine the returned value is 600 seconds / 10 minutes.

The S-CSCF will then grant service, the subscriber can start their voice call, and start the countdown of the time granted by the OCS.

As our chatty subscriber stays on their call, the S-CSCF approaches the limit of the Granted Service units from the OCS (Say 500 seconds used of the 600 seconds granted).
Before this limit is reached the S-CSCF’s CTF function sends another Credit Control Request with the type UPDATE_REQUEST. This allows the OCS to analyse the remaining balance of the subscriber and policies to tell the S-CSCF how long the call can continue to proceed for in the form of granted service units returned in the Credit Control Answer, which for our example can be 300 seconds.

Eventually, and before the second lot of granted units runs out, our subscriber ends the call, for a total talk time of 700 seconds.

But wait, the subscriber been granted 600 seconds for our INITIAL request, and a further 300 seconds in our UPDATE_REQUEST, for a total of 900 seconds, but the subscriber only used 700 seconds?

The S-CSCF sends a final Credit Control Request, this time with type TERMINATION_REQUEST and lets the OCS know via the Used-Service-Unit AVP, how many units the subscriber actually used (700 seconds), meaning the OCS will refund the balance for the gap of 200 seconds the subscriber didn’t use.

If this were the interface for online charging of data, we’d have the PS-Information AVP, or for online charging of SMS we’d have the SMS-Information, and so on.

The architecture and framework for how the charging works doesn’t change between a voice call, data traffic or messaging, just the particulars for the type of service we need to bill, as defined in the Service Information AVP, and the OCS making a decision on that based on if the subscriber should be granted service, and if yes, how many units of whatever type.

Diameter – Insert Subscriber Data Request / Response

While we’ve covered the Update Location Request / Response, where an MME is able to request subscriber data from the HSS, what about updating a subscriber’s profile when they’re already attached? If we’re just relying on the Update Location Request / Response dialog, the update to the subscriber’s profile would only happen when they re-attach.

We need a mechanism where the HSS can send the Request and the MME can send the response.

This is what the Insert Subscriber Data Request/Response is used for.

Let's imagine we want to allow a subscriber to access an additional APN, or change an AMBR values of an existing APN;

We'd send an Insert Subscriber Data Request from the HSS, to the MME, with the Subscription Data AVP populated with the additional APN the subscriber can now access.

Beyond just updating the Subscription Data, the Insert Subscriber Data Request/Response has a few other funky uses.

Through it the HSS can request the EPS Location information of a Subscriber, down to the TAC / eNB ID serving that subscriber. It’s not the same thing as the GMLC interfaces used for locating subscribers, but will wake Idle UEs to get their current serving eNB, if the Current Location Request is set in the IDR Flags.

But the most common use for the Insert-Subscriber-Data request is to modify the Subscription Profile, contained in the Subscription-Data AVP,

If the All-APN-Configurations-Included-Indicator is set in the AVP info, then all the existing AVPs will be replaced, if it’s not then everything specified is just updated.

The Insert Subscriber Data Request/Response is a bit novel compared to other S6a requests, in this case it’s initiated by the HSS to the MME (Like the Cancel Location Request), and used to update an existing value.

Docker & BIND as an ENUM Playground

In the last we covered what ENUM is and how it works, so to take this into a more practical example, I thought I’d share the details of the ENUM server I’ve setup in my lab, and the Docker container I’ve bundled it into.

Inside the Docker container we’ll be running Bind – this post won’t teach you much about Bind, there’s already lots of good information on it elsewhere, but we will cover the parameters involved in setting up ENUM records (NAPTR) for E.164 addresses.

Getting the Environment up and Running

First we’ll need to setup our environment, I’ve published the images for the container to Dockerhub, but we’ll build it from the Dockerfile so you can edit the files and rebuild as you play around:

git clone https://github.com/nickvsnetworking/ENUM_Playground
cd ENUM_Playground
docker build --pull --rm -f "Dockerfile" -t enum:latest "."

systemd-resolve on Ubuntu binds to port 53 by default, which can lead to some headaches, so we’ll create a new network in Docker for this to run in, so it doesn’t conflict with anything else you may be running:

sudo docker network create --subnet=172.30.0.0/26 enum_playground

And now we’ll run the ENUM container in the enum_playground network and with the IP 172.30.0.2,

docker run -d --rm --name=enum --net=enum_playground --ip=172.30.0.2 enum

Ok, that’s the environment setup, let’s run some queries!

E.164 to SIP URI Resolution with ENUM

In our last post we covered the basics of formatting an E.164 number and querying a DNS server to get it’s call routing information.

Again we’re going to use Dig to query this information. In reality ENUM queries would be run by an endpoint, or software like FreeSWITCH or Kamailio (Spoiler alert, posts on ENUM handling in those coming later), but as we’re just playing Dig will work fine.

So let’s start by querying a single E.164 address, +61355500911

First we’ll reverse it and put full stops / periods between the numbers, to get 1.1.9.0.0.5.5.5.3.1.6

Next we’ll add the e164.arpa prefix, which is the global prefix for ENUM addresses, and presto, that’s what we’ll query – 1.1.9.0.0.5.5.5.3.1.6.e164.arpa

Lastly we’ll feed this into a Dig query against the IP of our container and of type NAPTR,

dig @172.30.0.2 -t naptr 1.1.9.0.0.5.5.5.3.1.6.e164.arpa

So what did you get back?

Well, if everything is working your output should look something like the output I’ve got below,

NAPTR results for queried ENUM Address

So how do we interpret this? Well let’s break it down,

The first part is the domain we queried, simple enough in this case,

1.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Next up is the TTL or expiry, in this case it’s 3600 seconds (1 hour), shorter periods allow for changes to propagate / be reflected more quickly but at the expense of more load as results can’t be cached for as long. The class (IN) represents Internet, which is the only class commonly used, even on internal systems.

1.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Then we have the type of record returned, in our case it’s a NAPTR record,

1.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

After that is the Order, this defines the order in which the rules are to be parsed. Lower numbers are processed first, if no matches then the next lowest, and so on until the highest number is reached, we’ll touch on this in more detail later in this post,

1.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

The Pref is the processing preference. This is very handy for load balancing, as we can split traffic between hosts with different preferences. We’ll cover this later in this post too.

1.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

The Flags represent the type of record we’re going to get, for most ENUM traffic this is going to be set to U, to denote a SIP URI with Regex, while the Service value we’ll be looking for will be “E2U+sip” service to identify SIP URIs to route calls to, but could be other values like Email addresses, IM Addresses or PSTN numbers, to be parsed by other applications.

1.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Lastly we’ve got the Regex part. Again not going to cover Regex as a whole, just the DNS particulars.

Everything between the first and second ! denotes what we’re searching for, while everything from the second ! to the last ! denotes what we replace it with.

In the below example that means we’re matching ^.* which means starting with (^) any character (.) zero or more times (*), which gets replaced with sip:[email protected],

1.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

How should this be treated?

For the first example, a call to the E.164 address of 61355500912 will be first formatted into a domain as per the ENUM requirements (1.1.9.0.0.5.5.5.3.1.6.e164.arpa) and then queried as a NAPTR record against the DNS server,

1.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Only a single record has been returned so we don’t need to worry about the Order or Preference, and the Regex matches anything and replaces it with the resulting SIP URI of sip:[email protected], which is where we’ll send our INVITE.

Under the Hood

Inside the Repo we cloned earlier, if you open the e164.arpa.db file, things will look somewhat familiar,

The record we just queried is the first example in the Bind config file,

; E.164 Address +61355500911 - Simple no replacement (Resolves all traffic to sip:[email protected])
1.1.9.0.0.5.5.5.3.1.6 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

The config file is just the domain, class, type, order, preference, flags, service and regex.

Astute readers may have noticed the trailing . which where we can put a replacement domain if Regex is not used, but it cannot be used in conjunction with Regex, so for all our work it’ll just be a single trailing . on each line.

You can (and probably should) change the values in the e164.arpa.db file as we go along to try everything out, you’ll just need to rebuild the container and restart it each time you make a change.

This post is going to focus on Bind, but the majority of modern DNS servers support NAPTR records, so you can use them for ENUM as well, for example I manage the DNS for this site thorough Cloudflare, and I’ve put a screenshot below of an example private ENUM address I’ve added into it.

Setting up a NAPTR record in Cloudflare DNS

Preference to Split Traffic between Servers

So with a firm understanding of a single record being returned, let’s look at how we can use ENUM to cleverly route traffic to multiple hosts.

If we have a pool of servers we may wish to evenly distribute all traffic across them, so that’s how E.164 address +61355500912 is setup – to route traffic evenly (50/50) across two servers.

Querying it with Dig provides the following result:

dig @172.30.0.2 -t naptr 2.1.9.0.0.5.5.5.3.1.6.e164.arpa
;; ANSWER SECTION:
2.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR  10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" . 2.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR  10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

So as the order value (10) is the same for both records, we can ignore it – there isn’t one value lower than the other.

We can see both records have a preference of 100, in practice, this means they each get 50% of the traffic. The formula for traffic distribution is pretty simple, each server gets the value of it’s preference, divided by the total of all the preferences,

So for server1 it’s preference is 100 and the total of all the preferences combined is 200, so it gets 100/200, which is equivalent to one half aka 50%.

We might have a scenario where we have 3 servers, but one is significantly more powerful than the others, so let’s look at giving more traffic to one server and less to others, this example gets a little more complex but should cement your understanding of how the preference works;

dig @172.30.0.2 -t naptr 3.1.9.0.0.5.5.5.3.1.6.e164.arpa
3.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR  10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" . 3.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR  10 200 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .
3.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR  10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

So now 3 servers, again none have a lower order than the other, it’s set to 10 for them all so we can ignore the order,

Next we can see the total of all the priority values is 400,

Server 2 has a priority of 100 so it gets 100/400 total priority, or a quarter of all traffic. Server 1 has the same value, so also gets a quarter of all traffic,

Server 3 however has a priority of 200 so it gets 200/400, or to simplify half of all traffic.

The Bind config for this is:

; E.164 Address +61355500913 - More complex load balance between 3 hosts (25% server1, 25% server2, 50% server3)
3.1.9.0.0.5.5.5.3.1.6 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" . 3.1.9.0.0.5.5.5.3.1.6 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .
3.1.9.0.0.5.5.5.3.1.6 IN NAPTR 10 200 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Order for Failover

Primarily the purpose of the order is to enable wildcard routes (as we’ll see later) to be overwritten by more specific routes, but a secondary use in some implementations use Order as a way to list the preferences of the SIP URIs to route to. For example we could have two servers, one a primary and the other a standby, with the standby only to be used only if the primary SIP URI was not responding.

E.164 number +61355500914 is setup to return two SIP URIs,

dig @172.30.0.2 -t naptr 4.1.9.0.0.5.5.5.3.1.6.e164.arpa
;; ANSWER SECTION:
4.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR  10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" . 4.1.9.0.0.5.5.5.3.1.6.e164.arpa. 3600 IN NAPTR  20 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Our DNS client will first use the SIP URI sip:[email protected] as it has the lower order value (10), and if that fails, can try the entry with the next lowest order-value (20) which would be sip:[email protected].

The Bind config for this is:

; E.164 Address +61355500914 - Order example returning multiple SIP URIs to try for failover
4.1.9.0.0.5.5.5.3.1.6 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" . 4.1.9.0.0.5.5.5.3.1.6 IN NAPTR 20 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Wildcards

If we have a 1,000 number block, having to add 1000 individual records can be very tedious. Instead we can use wildcard matching (thanks to the fact we’ve reversed the E.164 address) to match ranges. For example if we have E.164 numbers from +61255501000 to +61255501999 we can add a wildcard entry to match the +61255501x prefix,

I’ve set this up already so let’s lookup the E.164 number +6125501234,

dig @172.30.0.2 -t naptr 4.3.2.1.0.5.5.5.2.1.6.e164.arpa
;; ANSWER SECTION:
4.3.2.1.0.5.5.5.2.1.6.e164.arpa. 3600 IN NAPTR  50 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

If you look up any other number starting with +6125501 you’ll get the same result, and here’s the Bind config for it:

; Wildcard E.164 Address +61255501* - Wildcard example for all destinations starting with E.164 prefix +61255501x to single destination (sip:[email protected])
; For example E.164 number +6125501234 will resolve to sip:[email protected]
*.1.0.5.5.5.2.1.6 IN NAPTR 100 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

The catch with this is they’re all pointing at the same SIP URI, so we can’t treat the calls differently based on the called number – This is where the Regex magic comes in.

We can use group matching to match a group and fill it in the dialed number into the SIP Request URI, for example:

!(^.*$)!sip:+1\[email protected]!

Will match the E.164 number requested and put it inside sip:[email protected]

The +61255502xxx prefix is setup for this, so if we query +61255502000 (or any other number between +61255502000 and +61255502999) we’ll get the regex query in the resulting record.

Keep in mind DNS doesn’t actually apply the Regex transformation, just shares it, and the client applies the transformation.

dig @172.30.0.2 -t naptr 0.0.0.2.0.5.5.5.2.1.6.e164.arpa
;; ANSWER SECTION:
0.0.0.2.0.5.5.5.2.1.6.e164.arpa. 3600 IN NAPTR  100 100 "u" "E2U+sip" "!(^.*$)!sip:+1\[email protected]!" .

And the corresponding Bind config:

; Wildcard example for all destinations starting with E.164 prefix +61255502x to regex filled destination
; For example a request to 61255502000 will return sip:[email protected])
*.2.0.5.5.5.2.1.6 IN NAPTR 100 100 "u" "E2U+sip" "!(^.*$)!sip:+1\\[email protected]!" .

One last thing to keep in mind, is that Wildcard priorities are of any length.
This means +612555021 would match as well as +6125550299999999999999. Typically terminating switches drop any superfluous digits, and NU those that are too short, but keep this in mind, that length is not taken into account.

Wildcard Priorities

So with our wildcards in place, what if we wanted to add an exception, for example one number in our 61255502xxx block of numbers gets ported to another carrier and needs to be routed elsewhere?

Easy, we just add another entry for that number being more specific and with a lower order than the wildcard, which is what’s setup for E.164 number +61255502345,

dig @172.30.0.2 -t naptr 5.4.3.2.0.5.5.5.2.1.6.e164.arpa
;; ANSWER SECTION:
5.4.3.2.0.5.5.5.2.1.6.e164.arpa. 3600 IN NAPTR  50 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Which does not return the same result as the others that match the wildcard,

Bind config:

; Wildcard example for all destinations starting with E.164 prefix +61255502x to regex filled destination
; For example a request to +61255502000 will return sip:[email protected])
*.2.0.5.5.5.2.1.6 IN NAPTR 100 100 "u" "E2U+sip" "!(^.*$)!sip:+1\\[email protected]!" .

; More specific example with lower order than +6125550x wildcard for E.164 address +61255502345 will return sip:[email protected]
5.4.3.2.0.5.5.5.2.1.6 IN NAPTR 50 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

We can combine all of the tricks we’ve covered here, from statically defined entries, wildcards, regex replacement, multiple entries with multiple orders and preferences, to create really complex routing, using only DNS.

Summary & Next Steps

So by now hopefully you’ve got a fair understanding of how NAPTR and DNS work together to translate E.164 addresses into SIP URIs,

Of course being able to do this manually with Dig and comprehend how it’ll route is only one part of the picture, in the next posts we’ll cover using Kamailio and FreeSWITCH to query ENUM routing information and route traffic to it,

ENUM – DNS based Call Routing

DNS is commonly used for resolving domain names to IP Addresses, and is often described as being like “the phone book of the Internet”.

So what’s the phone book of phone books?

The answer, is (kind of) DNS. With the aid of E.164 number to URI mapping (ENUM), DNS can be used to resolve phone numbers into SIP URIs to route the traffic to.

So what is ENUM?

ENUM allows us to bypass the need for a central switch for routing calls to numbers, and instead, through a DNS lookup, resolve a phone number into a reachable SIP URI that is the ultimate destination for the traffic.

Imagine you want to call a company, you dial the phone number for that company, your phone does a DNS query against the phone number, which returns the SIP URI of the company’s PBX, and your phone sends the SIP INVITE directly to the company’s PBX, with no intermediary party carrying the call.

3GPP have specified ENUM as the prefered mechanism for resolving phone numbers into SIP addresses, and while it’s widespread adoption on the public Internet is still in its early days (See my post on The Sad story of ENUM in Australia) it is increasingly common in IMS networks and inside operator networks.

IETF defined RFC 6116 for “The E.164 to Uniform Resource Identifiers (URI) Dynamic Delegation Discovery System (DDDS) Application (ENUM)”, which defines how the system works.

So how does ENUM actually work?

ENUM allow us to lookup a phone number on a DNS server and find the SIP URI a server that will handle traffic for the phone number, but it’s a bit more complicated than the A or AAAA records you’d use to resolve a website, ENUM relies on NAPTR records.

Let’s look at the steps involved in taking an E.164 number and knowing where to send it.

Step 1 – Reverse the Numbers

We read phone numbers from left to right.

This is because historically the switch needs to get all the long-distance routing sorted first. The switch has to route your call to the exchange that serves that subscriber, which is what all the area codes and prefixes assigned to areas are all about (Throwback to SZU for any old Telco buffs).

For an E.164 number you’ve got a Country Code, Area Code and then the Subscriber Number. The number gets more specific as it goes along.

But getting more specific as you go along is the opposite how how DNS works, millions of domains share the .com suffix, and the unique / specific part is the bits before that.

So the first step in the ENUM process is to reverse the phone number, so let’s take phone number (03) 5550 0912, which in E.164 is +61 3 5550 0912.

As the spaces in the phone numbers are there for the humans, we’ll drop all of them and reverse the number, as DNS is more specific right-to-left, so we end up with

2.1.9.0.0.5.5.5.3.1.6

Step 2 – Add the Suffix

The ITU ENUM specifies the suffix e164.arpa be assigned for public ENUM entries. Private ENUM deployments may use their own suffix, but to make life simple I’m going to use e164.arpa as if it were public.

So we’ll append the e164.arpa domain onto our reversed and formatted E.164 phone number:

2.1.9.0.0.5.5.5.3.1.6.e164.arpa

Step 3 – Query it

Next we’ll run a Naming Authority Pointer (NAPTR) query against the domain, to get back a list of records for that number.

DNS is a big topic, and NAPTR and SRV takes up a good chunk of it, but what you need to know is that by using NAPTR we’re not limited to just a single response, we could have a weighted pool of servers handling traffic for this phone number, and be able to control load through the use of NAPTR, amongst other things.

Of course, if our phone can query the public NAPTR records, then so can anyone else, so we can just use a tool like Dig to query the record ourselves,

dig @10.0.1.252 -t naptr 2.1.9.0.0.5.5.5.3.1.6.e164.arpa

In the answers section I’ve setup this DNS server to only return a single response, with the regex SIP URI to use, in my case that’s sip:[email protected]

You’ll obviously need to replace the DNS server with your DNS server, and the query with the reversed and formatted version of the E.164 number you wish to query.

Step 4 – Send SIP traffic

After looking at the NAPTR records returned and using the weight and priority to determine which server/s to send to first, our phone forwards an INVITE to the URI returned in the NAPTR record.

How to interpret the returned results?

The first thing to keep in mind when working with ENUM is multiple records being returned is supported, and even encouraged.

NAPTR results return 7 fields, which define how it should be handled.

The host part is fairly obvious, and defines the host / DNS entry we’re talking about.

The Service defines what type of service this is. ENUM can be expanded beyond just voice, for example you may want to also return an email address or IM address as well as a SIP Address on an ENUM query, which you can do. By default voice uses the “E2U+sip” service to identify SIP URIs to route calls to, so in this context that’s what we’re interested in, but keep in mind there are other types out there,

Example ENUM query against a phone number showing other types of services (Email & Web)

The Order simply defines the order in which the rules are to be parsed. Lower numbers are processed first, if no matches then the next lowest, and so on until the highest number is reached.

The Pref is the processing preference. For load balancing 50/50 between two sites say a Melbourne and Sydney site, we’d return two results, with the same Order, and the same Pref, would see traffic split 50/50 between the two sites.
We could split this further, a Pref value of 10 for Melbourne, 10 for Sydney, 5 for Brisbane and 5 for Perth would see 33% of calls route to Melbourne, 33% of calls route to Sydney, 16.5% of calls route to Brisbane and 16.5% of calls route to Perth.
This is because we’d have a total preference value of 30, and the individual preference for each entry would work out as the fraction of the total (ie Pref 10 out of 30 = 10/30 or 33.3%).

The Flags denote the type of record we’re going to get, for most ENUM traffic this is going to be set to U, to denote a SIP URI with Regex.

The regexp field contains our SIP URI in the form of a Regular expression, which can include pattern matching and replacement. This is most commonly used to fill in the phone number into the SIP URI, for example instead of hardcoding the phone number into the response, we could use a Regular expression to fill in the requested number into the SIP URI.

!(^.*$)!sip:+1\[email protected]!

ENUM sounds great, how do I get it?

Here’s the tricky part.

If you’re looking to implement ENUM for an internal network, great, I’ll have some more posts here over the next few weeks covering off configuration of a DNS server to support ENUM lookups, and using Kamailio to lookup ENUM routes.

In terms of public ENUM, while many carriers are using ENUM inside their networks, public adoption of ENUM in most markets has been slow, for a number of reasons.

Many incumbent operators have been reluctant to embrace public ENUM as their role as an operator would be relegated to that of a Domain registrar.
Additionally, there’s real security risks involved in moving to ENUM – opening your phone system up to the world to accept inbound calls from anywhere. This could lead to DOS-style attacks of flooding phone numbers with automatically generated traffic, privacy risks and even less validation in terms of caller ID trust.

RIPE maintains the EnumData.org website listing the status of ENUM for each country / region.