Category Archives: LTE

3GPP Long Term Evolution (4G)

FreeDiameter – Generating Certificates

Even if you’re not using TLS in your FreeDiameter instance, you’ll still need a certificate in order to start the stack.

Luckily, creating a self-signed certificate is pretty simple,

Firstly we generate your a private key and public certificate for our required domain – in the below example I’m using, but you’ll need to replace that with the domain name of your freeDiameter instance.

openssl req -new -batch -x509 -days 3650 -nodes     \
   -newkey rsa:1024 -out /etc/freeDiameter/cert.pem -keyout /etc/freeDiameter/privkey.pem \
   -subj /

Next we generate a new set of Diffie-Hellman parameter set using OpenSSL.

openssl dhparam -out /etc/freeDiameter/dh.pem 1024 

Lastly we’ll put all this config into the freeDiameter config file:

TLS_Cred = "/etc/freeDiameter/cert.pem", "/etc/freeDiameter/privkey.pem";
TLS_CA = "/etc/freeDiameter/cert.pem";
TLS_DH_File = "/etc/freeDiameter/dh.pem";

If you’re using freeDiameter as part of another software stack (Such as Open5Gs) the below filenames will contain the config for that particular freeDiameter components of the stack:

  • freeDiameter.conf – Vanilla freeDiameter
  • mme.conf – Open5Gs MME
  • pcrf.conf – Open5Gs PCRF
  • smf.conf – Open5Gs SMF / P-GW-C
  • hss.conf – Open5Gs HSS

Testing Mobile Networks with Remote Test Phones

I build phone networks, and unfortunately, I’m not able to be everywhere at once.

This means sometimes I have to test things in networks I may not be within the coverage of.

To get around this, I’ve setup something pretty simple, but also pretty powerful – Remote test phones.

Using a Raspberry Pi, Intel NUC, or any old computer, I’m able to remotely control Android handsets out in the field, in the coverage footprint of whatever network I need.

This means I can make test calls, run speed testing, signal strength measurements, on real phones out in the network, without leaving my office.

Base OS

Because of some particularities with Wayland and X11, for this I’d steer clear of Ubuntu distributions, and suggest using Debian if you’re using x86 hardware, and Raspbian if you’re using a Pi.

Setup Android Debug Bridge (adb)

The base of this whole system is ADB, the Android Debug Bridge, which exposes the ability to remotely control an Android phone over USB.

You can also do this over WiFi, but I find for device testing, wired allows me to airplane mode a device or disable data, which I can’t do if the device is connected to ADB via WiFi.

There’s lot of info online about setting Android Debug Bridge up on your device, unlocking the Developer Mode settings, etc, if you’ve not done this before I’ll just refer you to the official docs.

Before we plug in the phones we’ll need to setup the software on our remote testing machine, which is simple enough:

[email protected]:~$ sudo apt install android-tools-adb
sudo apt install android-tools-fastboot

Now we can plug in each of the remote phones we want to use for testing and run the command “adb devices” which should list the phones with connected to the machine with ADB enabled:

[email protected]:~$ adb devices
List of devices attached
ABCDEFGHIJK	unauthenticated
LMNOPQRSTUV	unauthenticated

You’ll get a popup on each device asking if you want to allow USB debugging – If this is going to be a set-and-forget deployment, make sure you tick “Always allow from this Computer” so you don’t have to drive out and repeat this step, and away you go.

How to Access Developer Options and Enable USB Debugging on Android

Lastly we can run adb devices again to confirm everything is in the connected state


scrcpy an open-source remote screen mirror / controller that allows us to control Android devices from a computer.

In our case we’re going to install with Snap (if you hate snaps as many folks do, you can also compile from source):

[email protected]:~$ snap install scrcpy

Remote Access

If you’re a regular Linux user, the last bit is the easiest.

We’re just going to use SSH to access the Linux machine, but with X11 forwarding.

If you’ve not come across X11 fowarding before, from a Linux machine just add the -X option to your SSH command, for example from my laptop I run:

nick@oldfaithful:~$ ssh [email protected] -X

Where is the remote tester device.

After SSHing into the box, we can just run scrcpy and boom, there’s the window we can interact with.

If you’ve got multiple devices connected to the same device, you’ll need to specify the ADB device ID, and of course, you can have multiple sessions open at the same time.

scrcpy -s 61771fe5

That’s it, as simple as that.


A few settings you may need to set:

I like to enable the “Show taps” option so I can see where my mouse is on the touchscreen and see what I’ve done, it makes it a lot easier when recording from the screen as well for the person watching to follow along.

You’ll probably also want to disable the lock screen and keep the screen awake

Some OEMs have an additonal tick box if you want to be able to interact with the device (rather than just view the screen), which often requires signing into an account, if you see this toggle, you’ll need to turn it on:

Ansible Playbook

I’ve had to build a few of these, so I’ve put an Ansible Playbook on Github so you can create your own.

You can grab it from here.

CGrates in Baby Steps – Part 1

So you have a VoIP service and you want to rate the calls to charge your customers?

You’re running a mobile network and you need to meter data used by subscribers?

Need to do least-cost routing?

You want to offer prepaid mobile services?

Want to integrate with Asterisk, Kamailio, FreeSWITCH, Radius, Diameter, Packet Core, IMS, you name it!

Well friends, step right up, because today, we’re talking CGrates!

So before we get started, this isn’t going to be a 5 minute tutorial, I’ve a feeling this may end up a big multipart series like some of the others I’ve done.
There is a learning curve here, and we’ll climb it together – but it is a climb.


Let’s start with a Debian based OS, installation is a doddle:

sudo wget -O - | sudo apt-key add -
echo "deb nightly main" | sudo tee /etc/apt/sources.list.d/cgrates.list
sudo apt-get update
sudo apt-get install cgrates -y
apt-get install mysql-server redis-server git -y

We’re going to use Redis for the DataDB and MariaDB as the StorDB (More on these concepts later), you should know that other backend options are available, but for keeping things simple we’ll just use these two.

Next we’ll get the database and config setup,

cd /usr/share/cgrates/storage/mysql/
./ root localhost
cgr-migrator -exec=*set_versions

Lastly we’ll clone the config files from the GitHub repo:

Rating Concepts

So let’s talk rating.

In its simplest form, rating is taking a service being provided and calculating the cost for it.

The start of this series will focus on voice calls (With SMS, MMS, Data to come), where the calling party (The person making the call) pays, so let’s imagine calling a Mobile number (Starting with 614) costs $0.22 per minute.

To perform rating we need to determine the Destination, the Rate to be applied, and the time to charge for.

For our example earlier, a call to a mobile (Any number starting with 614) should be charged at $0.22 per minute. So a 1 minute call will cost $0.22 and a 2 minute long call will cost $0.44, and so on.

We’ll also charge calls to fixed numbers (Prefix 612, 613, 617 and 617) at a flat $0.20 regardless of how long the call goes for.

So let’s start putting this whole thing together.

Introduction to RALs

RALs is the component in CGrates that takes care of Rating and Accounting Logic, and in this post, we’ll be looking at Rating.

The rates have hierarchical structure, which we’ll go into throughout this post. I took my notepad doodle of how everything fits together and digitized it below:


Destinations are fairly simple, we’ll set them up in our Destinations.csv file, and it will look something like this:


Each entry has an ID (referred to higher up as the Destination ID), and a prefix.

Also notice that some Prefixes share an ID, for example 612, 613, 617 & 618 are under the Destination ID named “DST_AUS_Fixed”, so a call to any of those prefixes would match DST_AUS_Fixed.


Rates define the price we charge for a service and are defined by our Rates.csv file.


Let’s look at the fields we have:

  • ID (Rate ID)
  • ConnectFee – This is the amount charged when the call is answered / connected
  • The Rate is how much we will charge, it’s loosely cents, but could be any currency. By default CGrates looks down to 4 decimal places.
  • RateUnit is how often this rate is applied in seconds
  • RateIncriment is how often this is evaluated in seconds
  • GroupIntervalStart – Activates an event when triggered

So let’s look at how this could be done, and the gotchas that exist.

So let’s look at some different use cases and how we’d handle them.

Per Minute Billing

This would charge a rate per minute, at the start of the call, the first 60 seconds will cost the caller $0.25.

At the 61 second mark, they will be charged another $0.25.

60 seconds after that they will be charged another $0.25 and so on.


This is nice and clean, a 1 second call costs $0.25, a 60 second call costs $0.25, and a 61 second call costs $0.50, and so on.

This is the standard billing mechanism for residential services, but it does not pro-rata the call – For example a 1 second call is the same cost as a 59 second call ($0.25), and only if you tick over to 61 seconds does it get charged again (Total of $0.50).

Per Second Billing

If you’re doing a high volume of calls, paying for a 3 second long call where someone’s voicemail answers the call and was hung up, may seem a bit steep to pay the same for that as you would pay for 59 seconds of talk time.

Instead Per Second Billing is more common for high volume customers or carrier-interconnects.

This means the rate still be set at $0.25 per minute, but calculated per second.

So the cost of 60 seconds of call is $0.25, but the cost of 30 second call (half a minute) should cost half of that, so a 30 second call would cost $0.125.


How often we asses the charging is defined by the RateIncrement parameter in the Rate Table.

We could achieve the same outcome another way, by setting the RateIncriment to 1 second, and the dividing the rate per minute by 60, we would get the same outcome, but would be more messy and harder to maintain, but you could think of this as $0.25 per minute, or $0.004166667 per second ($0.25/60 seconds).

Flat Rate Billing

Another option that’s commonly used is to charge a flat rate for the call, so when the call is answered, you’re charged that rate, regardless of the length of the call.

Regardless if the call is for 1 second or 10 hours, the charge is the same.


For this we just set the ConnectFee, leaving the Rate at 0, so the cost will be applied on connection, with no costs applied per time period.

This means a 1 second call will cost $0.25, while a 3600 second call will still cost $0.25.

We charge a connect fee, but no rate.

Linking Destinations to the Rates to Charge

Now we’ve defined our Destinations and our Rates, we can link the two, defining what Destinations get charged what Rates.

This is defined in DestinationRates.csv


Let’s look at the Fields,

  • ID (Destination Rate ID)
  • DestinationID – Refers to the DestinationID defined in the Destinations.csv file
  • RatesTag – Referes to the Rate ID we defined in Rates.csv
  • RoundingMethod – Defines if we round up or down
  • RoundingDecimals – Defines how many decimal places to consider before rounding
  • MaxCost – The maximum cost this can go up to
  • MaxCostStrategy – What to do if the Maximum Cost is reached – Either make the rest of the call Free or Disconnect the call

So for each entry we’ll define an ID, reference the Destination and the Rate to be applied, the other parts we’ll leave as boilerplate for now, and presto. We have linked our Destinations to Rates.

Rating Plans

We may want to offer different plans for different customers, with different rates.

That’s what we define in our Rating Plans.

  • ID (RatingPlanID)
  • DestinationRatesId (As defined in DestinationRates.csv)
  • TimingTag – References a time profile if used
  • Weight – Used to determine what precedence to use if multiple matches

So as you may imagine we need to link the DestinationRateIDs we just defined together into a Rating Plan, so that’s what I’ve done in the example above.

Rating Profiles

The last step in our chain is to link Customers / Subscribers to the profiles we’ve just defined.

How you allocate a customer to a particular Rating Plan is up to you, there’s numerous ways to approach it, but for this example we’re going to use one Rating Profile for all callers coming from the “” tenant:


Let’s go through the fields here,

  • Tenant is a grouping of Customers
  • Category is used to define the type of service we’re charging for, in this case it’s a call, but could also be an SMS, Data usage, or a custom definition.
  • Subject is typically the calling party, we could set this to be the Caller ID, but in this case I’ve used a wildcard “*any”
  • ActivationTime allows us to define a start time for the Rating Profile, for example if all our rates go up on the 1st of each month, we can update the Plans and add a new entry in the Rating Profile with the new Plans with the start time set
  • RatingPlanID sets the Rating Plan that is used as we defined in RatingPlans.csv

Loading the Rates into CGrates

At the start we’ll be dealing with CGrates through CSV files we import, this is just one way to interface with CGrates, there’s others we’ll cover in due time.

CGRates has a clever realtime architecture that we won’t go into in any great depth, but in order to load data in from a CSV file there’s a simple handy tool to run the process,

root@cgrateswitch:/home/nick# cgr-loader -verbose -path=/home/nick/tutorial/ -flush_stordb

Obviously you’ll need to replace with the folder you cloned from GitHub.

Trying it Out

In order for CGrates to work with Kamailio, FreeSWITCH, Asterisk, Diameter, Radius, and a stack of custom options, for rating calls, it has to have common mechanisms for retrieving this data.

CGrates provides an API for rating calls, that’s used by these platforms, and there’s a tool we can use to emulate the signaling for call being charged, without needing to pickup the phone or integrate a platform into it.

root@cgrateswitch:/home/nick# cgr-console 'cost Category="call" Tenant="" Subject="3005" Destination="614" AnswerTime="2014-08-04T13:00:00Z" Usage="60s"'

The tenant will need to match those defined in the RatingProfiles.csv, the Subject is the Calling Party identity, in our case we’re using a wildcard match so it doesn’t matter really what it’s set to, the Destination is the destination of the call, AnswerTime is time of the call being answered (pretty self explanatory) and the usage defines how many seconds the call has progressed for.

The output is a JSON string, containing a stack of useful information for us, including the Cost of the call, but also the rates that go into the decision making process so we can see the logic that went into the price.

 "AccountSummary": null,
 "Accounting": {},
 "CGRID": "",
 "Charges": [
   "CompressFactor": 1,
   "Increments": [
     "AccountingID": "",
     "CompressFactor": 1,
     "Cost": 0,
     "Usage": "0s"
     "AccountingID": "",
     "CompressFactor": 1,
     "Cost": 25,
     "Usage": "1m0s"
   "RatingID": "febb614"
 "Cost": 25,
 "Rates": {
  "7d4a755": [
    "GroupIntervalStart": "0s",
    "RateIncrement": "1m0s",
    "RateUnit": "1m0s",
    "Value": 25
 "Rating": {
  "febb614": {
   "ConnectFee": 0,
   "MaxCost": 0.12,
   "MaxCostStrategy": "*disconnect",
   "RatesID": "7d4a755",
   "RatingFiltersID": "7e42edc",
   "RoundingDecimals": 4,
   "RoundingMethod": "*up",
   "TimingID": "c15a254"
 "RatingFilters": {
  "7e42edc": {
   "DestinationID": "DST_AUS_Mobile",
   "DestinationPrefix": "614",
   "RatingPlanID": "RP_AUS",
   "Subject": "*"
 "RunID": "",
 "StartTime": "2014-08-04T13:00:00Z",
 "Timings": {
  "c15a254": {
   "MonthDays": [],
   "Months": [],
   "StartTime": "00:00:00",
   "WeekDays": [],
   "Years": []
 "Usage": "1m0s"

So have a play with setting up more Destinations, Rates, DestinationRates and RatingPlans, in these CSV files, and in our next post we’ll dig a little deeper… And throw away the CSVs all together!

Evolved Packet Core – Analysis Challenge

This post is one of a series of packet capture analysis challenges designed to test your ability to understand what is going on in a network from packet captures.
Download the Packet Capture and see how many of the questions you can answer from the attached packet capture.

The answers are at the bottom of this page, along with how we got to the answers.

This challenge focuses on the Evolved Packet Core, specifically the S1 and Diameter interfaces.

Why is the Subscriber failing to attach?

And what is the behavior we should be expecting to see?

What is the Cell ID of this eNodeB?

What is the Tracking Area?

That the subscriber is trying to attach in.

Does the device attaching to the network support VoLTE?

What type of IP is the subscriber requesting for this PDN session?

Is the device requesting an IPv4 address, IPv6 address or both?

What is the Diameter Application ID for S6a?

You should be able to ascertain this from information from the PCAP, without needing to refer to the standards.

What is the Crytpo RES returned by the HSS, and what is the RES returned by the SIM/UE?

Does this mean the subscriber was authenticated successfully?


Answer: Why is the Subscriber failing to attach?

The Diameter Update Location Request in frame 10 does not get answered by the HSS. After 5 seconds the MME gives up and rejects the connection.

Instead what should have happened is the HSS should have responded to the Update Location Request with an Update Location Answer, as we covered in the attach procedure.

Answer: What is the Cell ID of this eNodeB?

In Uplink messages from the eNodeB the EUTRAN-GCI field contains the Cell-ID of the eNodeB.

In this case the Cell-ID is 1.

Answer: What is the Tracking Area?

The tracking area is 123.

This information is available in the TAI field in the Uplink S1 messages.

Answer: Does the device attaching to the network support VoLTE?

No, the device does not support VoLTE.

There are a few ways we can get to this answer, and VoLTE support in the phone does not mean VoLTE will be enabled, but we can see the Voice Domain preference is set to CS Voice Only, meaning GSM/UMTS for voice calling.

This is common on cheaper handsets that do not support VoLTE.

Answer: What type of IP is the subscriber requesting for this PDN session? (IPv4/IPv6/Both)?

The subscriber is requesting an IPv4 address only.

We can see this in the ESM Message Container for the PDN Connectivity Request, the PDN type is “IPv4”.

Answer: What is the Diameter Application ID for S6a?

Answer: 16777251

This is shown for the Vendor-Specific-Application-Id AVP on an S6a message.

Answer: What is the Crytpo RES returned by the HSS, and what is the RES returned by the SIM/UE?

The RES (Response) and X-RES (Expected Response) Both are “dba298fe58effb09“, they do match, which means this subscriber was authenticated successfully.

You can learn more about what these values do in this post.

The Surprisingly Complicated World of SMS: Apple iPhone MT SMS

In iOS 15, Apple added support for iPhones to support SMS over IMS networks – SMSoIP. Previously iPhone users have been relying on CSFB / SMSoNAS (Using the SGs interface) to send SMS on 4G networks.

Getting this working recently led me to some issues that took me longer than I’d like to admit to work out the root cause of…

I was finding that when sending a Mobile Termianted SMS to an iPhone as a SIP MESSAGE, the iPhone would send back the 200 OK to confirm delivery, but it never showed up on the screen to the user.

The GSM A-I/F headers in an SMS PDU are used primarily for indicating the sender of an SMS (Some carriers are configured to get this from the SIP From header, but the SMS PDU is most common).

The RP-Destination Address is used to indicate the destination for the SMS, and on all the models of handset I’ve been testing with, this is set to the MSISDN of the Subscriber.

But some devices are really finicky about it’s contents. Case in point, Apple iPhones.

If you send a Mobile Terminated SMS to an iPhone, like the one below, the iPhone will accept and send back a 200 OK to this request.

The problem is it will never be displayed to the user… The message is marked as delivered, the phone has accepted it it just hasn’t shown it…

SMS reports as delivered by the iPhone (200 OK back) but never gets displayed to the user of the phone as the RP-Destination Address header is populated

The fix is simple enough, if you set the RP-Destination Address header to 0, the message will be displayed to the user, but still took me a shamefully long time to work out the problem.

RP-Destination Address set to 0 sent to the iPhone, this time it’ll get displayed to the user.
Huawei BBU 3900 Architecture

Huawei Baseband Cheat Sheet

Baseband Units (UBBP)

CardMax LTE Cells
UBBPd33×20 MHz 2T2R
UBBPd43×20 MHz 4T4R
UBBPd56×20 MHz 2T2R OR 3×20 MHz 4T4R
UBBPd66×20 MHz 4T4R
UBBPe13×20 MHz 2T2R
UBBPe23×20 MHz 4T4R
UBBPe36×20 MHz 2T2R OR 3×20 MHz 4T4R
UBBPe46×20 MHz 4T4R OR 3×20 MHz 8T8R
Max Cells in LTE FDD

Main Processing and Transmission (LMPT/UMPT)

In some instances two boards can be used together to double the max cells or max throughput values.

CardMax CellsMax Throughput
(at MAC Layer)
Max UEs
(In RRC Connected)
LMPT18 Cells (4T4R)Uplink 300Mbps
Downlink 450Mbps
UMPTa36 Cells (4T4R)Aggregate 1.5Gbps10800
UMPTb136 Cells (4T4R)Aggregate 1.5Gbps10800
UMPTb236 Cells (4T4R)Aggregate 1.5Gbps10800
UMPTb336 Cells (4T4R)Aggregate 2Gbps10800
UMPTb936 Cells (4T4R)Aggregate 2Gbps10800
UMPTe72 Cells (4T4R)Aggregate 10Gbps14400

Lifecycle of a Dedicated Bearer – From Flow-Description AVP to Traffic Flow Templates

To support Dedicated Bearers we first have to have a way of profiling the traffic, to classify the traffic as being the type we want to provide the Dedicated Bearer for.

The first step involves a request from an Application Function (AF) to the PCRF via the Rx interface.

The most common type of AF would be a P-CSCF. When a VoLTE call gets setup the P-CSCF requests that a dedicated bearer be setup for the IP Address and Ports involved in the VoLTE call, to ensure users get the best possible call quality.

But Application Functions aren’t limited to just VoLTE – You could also embed an Application Function into the server for an online game to enable a dedicated bearer for users playing that game, or a sports streaming app that detects when a user starts streaming sports and creates a dedicated bearer for that user to send the traffic down.

The request to setup a dedicated bearer comes in the form of a Diameter request message from the AF, using the Rx reference point, typically from the P-CSCF to the PCRF in the network in an “AA-Request”.

Of main interest in the AA-Request is the Media Component AVP, that contains all the details needed to identify the traffic flow.

Now our PCRF is in charge of policy, and know which P-GW is serving the required subscriber. So the PCRF takes this information and sends a Gx Re-Auth Request to the PCEF in the P-GW serving the subscriber, with a Charging Rule the PCEF in the P-GW needs to install, to profile and apply QoS to the bearer.

So within the Gx Re-Auth Request is the Charging-Rule Definition, made up of Flow-Description AVP which I’ve written about here, that is used to identify and profile traffic flows and QoS parameters to apply to matching traffic.

Charging Rule Definition’s Flow-Information AVPs showing the information needed to profile the traffic

The QoS Description AVP defines which QoS parameters (QCI / ARP / Guaranteed & Maximum Bandwidth) should be applied to the traffic that matches the rules we just defined.

QoS information AVP
QoS Information AVP showing requested QoS Parameters

The P-GW sends back a Gx Re-Auth Answer, and gets to work actually setting up these bearers.

With the rule installed on the PCEF, it’s time to get this new bearer set up on the UE / eNodeB.

The P-GW sends a GTPv2 “Create Bearer Request” to the S-GW which forwards it onto the MME, to setup / define the Dedicated Bearer to be setup on the eNodeB.

GTPv2 “Create Bearer Request” sent by the P-Gw to the S-GW forwarded from the S-GW to the MME

The MME translates this into an S1 “E-RAB Setup Request” which it sends to the eNodeB to setup,

S1 E-RAB Setup request showing the E-RAB to be setup

Assuming the eNodeB has the resources to setup this bearer, it provides the details to the UE and sets up the bearer, sending confirmation back to the MME in the S1 “E-RAB Setup Response” message, which the MME translates back into GTPv2 for a “Create Bearer Response”

All this effort to keep your VoLTE calls sounding great!

Backing up and Restoring Open5GS

You may find you need to move your Open5GS deployments from one server to another, or split them between servers.
This post covers the basics of migrating Open5GS config and data between servers by backing up and restoring it elsewhere.

The Database

Open5GS uses MongoDB as the database for the HSS and PCRF. This database contains all our SDM data, like our SIM Keys, Subscriber profiles, PCC Rules, etc.

Backup Database

To backup the MongoDB database run the below command (It doesn’t need sudo / root to run):

mongodump -o Open5Gs_"`date +"%d-%m-%Y"`"

You should get a directory called Open5Gs_todaysdate, the files in that directory are the output of the MongoDB database.

Restore Database

If you copy the backup we just took (the directory named Open5Gs_todaysdate) to the new server, you can restore the complete database by running:

mongorestore Open5Gs_todaysdate

This restores everything in the database, including profiles and user accounts for the WebUI,

You may instead just restore the Subscribers table, leaving the Profiles and Accounts unchanged with:

mongorestore Open5Gs_todaysdate/open5gs/subscribers.bson -c subscribers -d open5gs

The database schema used by Open5GS changed earlier this year, meaning you cannot migrate directly from an old database to a new one without first making a few changes.

To see if your database is affected run:

mongo open5gs --eval 'db.subscribers.find({"__v" : 0}).toArray()' | grep "imsi" | wc -l

Which will let you know how many subscribers are using the old database type. If it’s anything other than 0 running this Python script will update the database as required.

Once you have installed Open5GS onto the new server you’ll need to backup the data from the old one, and restore it onto the new one.

The Config Files

The text based config files define how Open5Gs will behave, everything from IP Addresses to bind on, to the interfaces and PLMN.

Again, you’ll need to copy them from the old server to the new, and update any IP Addresses that may change between the two.

On the old server run:

cp -r /etc/open5gs /tmp/

Then copy the “open5gs” folder to the new server into the /etc/ directory.

If you’re also changing the IP Address you’re binding on, you’ll need to update that in the YAML files.

Bringing Everything Online

Finally you’ll need to restart all the services,

sudo systemctl start open5gs-*

Run a basic health check to ensure the services are running,

ps aux | grep open5gs-

Should list all the running Open5Gs services,

And then check the logs to ensure everything is working as expected,

tail -f /var/log/open5gs/*.log

Jaffa Cakes explain the nuances between Centralized vs Decentralized Online Charging in 3GPP Networks

While reading through the 3GPP docs regarding Online Charging, there’s a concept that can be a tad confusing, and that’s the difference between Centralized and Non-Centralized Charging architectures.

The overall purpose of online charging is to answer that deceptively simple question of “does the user have enough credit for this action?”.

In order to answer that question, we need to perform rating and unit determination.


Rating is just converting connectivity credit units into monetary units.

If you go to the supermarket and they have boxes of Jaffa Cakes at $2.50 each, they have rated a box of Jaffa Cakes at $2.50.

1 Box of Jaffa Cakes rated at $2.50 per box

In a non-snack-cake context, such as 3GPP Online Charging, then we might be talking about data services, for example $1 per GB is a rate for data.
Or for a voice calls a cost per minute to call a destination, such as is $0.20 per minute for a local call.

Rating is just working out the cost per connectivity unit (Data or Minutes) into a monetary cost, based on the tariff to be applied to that subscriber.

Unit Determination

The other key piece of information we need is the unit determination which is the calculation of the number of non-monetary units the OCS will offer prior to starting a service, or during a service.

This is done after rating so we can take the amount of credit available to the subscriber and calculate the number of non-monetary units to be offered.

Converting Hard-Currency into Soft-Snacks

In our rating example we rated a box of Jaffa Cakes at $2.50 per box. If I have $10 I can go to the shops and buy 4x boxes of Jaffa cakes at $2.50 per box. The cashier will perform unit determination and determine that at $2.50 per box and my $10, I can have 4 boxes of Jaffa cakes.

Again, steering away from the metaphor of the hungry author, Unit Determination in a 3GPP context could be determining how many minutes of talk time to be granted.
Question: At $0.20 per minute to a destination, for a subscriber with a current credit of $20, how many minutes of talk time should they be granted?
Answer: 100 minutes ($20 divided by $0.20 per minute is 100 minutes).

Or to put this in a data perspective,
Question: Subscriber has $10 in Credit and data is rated at $1 per GB. How many GB of data should the subscriber be allowed to use?
Answer: 10GB.

Putting this Together

So now we understand rating (working out the conversion of connectivity units into monetary units) and unit determination (determining the number of non-monetary units to be granted for a given resource), let’s look at the the Centralized and Decentralized Online Charging.

Centralized Rating

In Centralized Rating the CTF (Our P-GW or S-CSCF) only talk about non-monetary units.
There’s no talk of money, just of the connectivity units used.

The CTFs don’t know the rating information, they have no idea how much 1GB of data costs to transfer in terms of $$$.

For the CTF in the P-GW/PCEF this means it talks to the OCS in terms of data units (data In/out), not money.

For the CTF in the S-CSCF this means it only ever talks to the OCS in voice units (minutes of talk time), not money.

This means our rates only need to exist in the OCS, not in the CTF in the other network elements. They just talk about units they need.

De-Centralized Rating

In De-Centralized Rating the CTF performs the unit conversion from money into connectivity units.
This means the OCS and CTF talk about Money, with the CTF determining from that amount of money granted, what the subscriber can do with that money.

This means the CTF in the S-CSCF needs to have a rating table for all the destinations to determine the cost per minute for a call to a destination.

And the CTF in the P-GW/PCEF has to know the cost per octet transferred across the network for the subscriber.

In previous generations of mobile networks it may have been desirable to perform decentralized rating, as you can spread the load of calculating our the pricing, however today Centralized is the most common way to approach this, as ensuring the correct rates are in each network element is a headache.

Centralized Unit Determination

In Centralized Unit Determination the CTF tells the OCS the type of service in the Credit Control Request (Requested Service Units), and the OCS determines the number of non-monetary units of a certain service the subscriber can consume.

The CTF doesn’t request a value, just tells the OCS the service being requested and subscriber, and the OCS works out the values.

For example, the S-CSCF specifies in the Credit Control Request the destination the caller wishes to reach, and the OCS replies with the amount of talk time it will grant.

Or for a subscriber wishing to use data, the P-GW/PCEF sends a Credit Control Request specifying the service is data, and the OCS responds with how much data the subscriber is entitled to use.

De-Centralized Unit Determination

In De-Centralized Unit Determination, the CTF determines how many units are required to start the service, and requests these units from the OCS in the Credit Control Request.

For a data service,the CTF in the P-GW would determine how many data units it is requesting for a subscriber, and then request that many units from the OCS.

For a voice call a S-CSCF may request an initial call duration, of say 5 minutes, from the OCS. So it provides the information about the destination and the request for 300 seconds of talk time.

Session Charging with Unit Reservation (SCUR)

Arguably the most common online charging scenario is Session Charging with Unit Reservation (SCUR).

SCUR relies on reserving an amount of funds from the subscriber’s balance, so no other services can those funds and translating that into connectivity units (minutes of talk time or data in/out based on the Requested Session Unit) at the start of the session, and then subsequent requests to debit the reserved amount and reserve a new amount, until all the credit is used.

This uses centralized Unit Determination and centralized Rating.

Let’s take a look at how this would look for the CTF in a P-GW/PCEF performing online charging for a subscriber wishing to use data:

  1. Session Request: The subscriber has attached to the network and is requesting service.
  2. The CTF built into the P-GW/PCEF sends a Credit Control Request: Initial Request (As this subscriber has just attached) to the OCS, with Requested Service Units (RSU) of data in/out to the OCS.
  3. The OCS performs rating and unit determination, and according to it’s credit risk policies, and a whole lot of other factors, comes back with an amount of data the subscriber can use, and reserves the amount from the account.
    (It’s worth noting at this point that this is not necessarily all of the subscriber’s credit in the form of data, just an amount the OCS is willing to allocate. More data can be requested once this allocated data is used up.)
  4. The OCS sends a Credit Control Answer back to our P-GW/PCEF. This contains the Granted Service Unit (GSU), in our case the GSU is data so defines much data up/down the user can transfer. It also may include a Validity Time (VT), which is the number of seconds the Credit Control Answer is valid for, after it’s expired another Credit Control Request must be sent by the CTF.
  5. Our P-GW/PCEF processes this, starts measuring the data used by the subscriber for reporting later, and sets a timer for the Validity Time to send another CCR at that point.
    At this stage, our subscriber is able to start using data.
  1. Some time later, either when all the data allocated in the Granted Service Units has been consumed, or when the Validity Time has expired, the CTF in the P-GW/PCEF sends another Credit Control Request: Update, and again includes the RSU (Requested Service Units) as data in/out, and also a USU (Used Service Units) specifying how much data the subscriber has used since the first Credit Control Answer.
  2. The OCS receives this information. It compares the Used Session Units to the Granted Session Units from earlier, and with this is able to determine how much data the subscriber has actually used, and therefore how much credit that equates to, and debit that amount from the account.
    With this information the OCS can reserve more funds and allocate another GSU (Granted Session Unit) if the subscriber has the required balance. If the subscriber only has a small amount of credit left the FUI (Final Unit Indication AVP) is set to determine this is all the subscriber has left in credit, and if this is exhausted to end the session, rather than sending another Credit Control Request.
  3. The Credit Control Answer with new GSU and the FUI is sent back to the P-GW/PCEF
  4. The P-GW/PCEF allows the session to continue, again monitoring used traffic against the GSU (Granted Session Units).
  1. Once the subscriber has used all the data in the Granted Session Units, and as the last CCA included the Final Unit Indicator, the CTF in the P-GW/PCEF knows it can’t just request more credit in the form of a CCR Update, so cuts of the subscribers’s session.
  2. The P-GW/PCEF then sends a Credit Control Request: Termination Request with the final Used Service Units to the OCS.
  3. The OCS debits the used service units from the subscriber’s balance, and refunds any unused credit reservation.
  4. The OCS sends back a Credit Control Answer which may include the CI value for Credit Information, to denote the cost information which may be passed to the subscriber if required.
Credit Control Request / Answer call flow in IMS Charging

Basics of EPC/LTE Online Charging (OCS)

Early on as subscriber trunk dialing and automated time-based charging was introduced to phone networks, engineers were faced with a problem from Payphones.

Previously a call had been a fixed price, once the caller put in their coins, if they put in enough coins, they could dial and stay on the line as long as they wanted.

But as the length of calls began to be metered, it means if I put $3 of coins into the payphone, and make a call to a destination that costs $1 per minute, then I should only be allowed to have a 3 minute long phone call, and the call should be cutoff before the 4th minute, as I would have used all my available credit.

Conversely if I put $3 into the Payphone and only call a $1 per minute destination for 2 minutes, I should get $1 refunded at the end of my call.

We see the exact same problem with prepaid subscribers on IMS Networks, and it’s solved in much the same way.

In LTE/EPC Networks, Diameter is used for all our credit control, with all online charging based on the Ro interface. So let’s take a look at how this works and what goes on.

Generic 3GPP Online Charging Architecture

3GPP defines a generic 3GPP Online charging architecture, that’s used by IMS for Credit Control of prepaid subscribers, but also for prepaid metering of data usage, other volume based flows, as well as event-based charging like SMS and MMS.

Network functions that handle chargeable services (like the data transferred through a P-GW or calls through a S-CSCF) contain a Charging Trigger Function (CTF) (While reading the specifications, you may be left thinking that the Charging Trigger Function is a separate entity, but more often than not, the CTF is built into the network element as an interface).

The CTF is a Diameter application that generates requests to the Online Charging Function (OCF) to be granted resources for the session / call / data flow, the subscriber wants to use, prior to granting them the service.

So network elements that need to charge for services in realtime contain a Charging Trigger Function (CTF) which in turn talks to an Online Charging Function (OCF) which typically is part of an Online Charging System (AKA OCS).

For example when a subscriber turns on their phone and a GTP session is setup on the P-GW/PCEF, but before data is allowed to flow through it, a Diameter “Credit Control Request” is generated by the Charging Trigger Function (CTF) in the P-GW/PCEF, which is sent to our Online Charging Server (OCS).

The “Credit Control Answer” back from the OCS indicates the subscriber has the balance needed to use data services, and specifies how much data up and down the subscriber has been granted to use.

The P-GW/PCEF grants service to the subscriber for the specified amount of units, and the subscriber can start using data.

This is a simplified example – Decentralized vs Centralized Rating and Unit Determination enter into this, session reservation, etc.

The interface between our Charging Trigger Functions (CTF) and the Online Charging Functions (OCF), is the Ro interface, which is a Diameter based interface, and is common not just for online charging for data usage, IMS Credit Control, MMS, value added services, etc.

3GPP define a reference online-charging interface, the Ro interface, and all the application-specific interfaces, like the Gy for billing data usage, build on top of the Ro interface spec.

Basic Credit Control Request / Credit Control Answer Process

This example will look at a VoLTE call over IMS.

When a subscriber sends an INVITE, the Charging Trigger Function baked in our S-CSCF sends a Diameter “Credit Control Request” (CCR) to our Online Charging Function, with the type INITIAL, meaning this is the first CCR for this session.

The CCR contains the Service Information AVP. It’s this little AVP that is where the majority of the magic happens, as it defines what the service the subscriber is requesting. The main difference between the multitude of online charging interfaces in EPC networks, is just what the service the customer is requesting, and the specifics of that service.

For this example it’s a voice call, so this Service Information AVP contains a “IMS-Information” AVP. This AVP defines all the parameters for a IMS phone call to be online charged, for a voice call, this is the called-party, calling party, SDP (for differentiating between voice / video, etc.).

It’s the contents of this Service Information AVP the OCS uses to make decision on if service should be granted or not, and how many service units to be granted. (If Centralized Rating and Unit Determination is used, we’ll cover that in another post)
The actual logic, relating to this decision is typically based on the the rating and tariffing, credit control profiles, etc, and is outside the scope of the interface, but in short, the OCS will make a yes/no decision about if the subscriber should be granted access to the particular service, and if yes, then how many minutes / Bytes / Events should be granted.

In the received Credit Control Answer is received back from our OCS, and the Granted-Service-Unit AVP is analysed by the S-CSCF.
For a voice call, the service units will be time. This tells the S-CSCF how long the call can go on before the S-CSCF will need to send another Credit Control Request, for the purposes of this example we’ll imagine the returned value is 600 seconds / 10 minutes.

The S-CSCF will then grant service, the subscriber can start their voice call, and start the countdown of the time granted by the OCS.

As our chatty subscriber stays on their call, the S-CSCF approaches the limit of the Granted Service units from the OCS (Say 500 seconds used of the 600 seconds granted).
Before this limit is reached the S-CSCF’s CTF function sends another Credit Control Request with the type UPDATE_REQUEST. This allows the OCS to analyse the remaining balance of the subscriber and policies to tell the S-CSCF how long the call can continue to proceed for in the form of granted service units returned in the Credit Control Answer, which for our example can be 300 seconds.

Eventually, and before the second lot of granted units runs out, our subscriber ends the call, for a total talk time of 700 seconds.

But wait, the subscriber been granted 600 seconds for our INITIAL request, and a further 300 seconds in our UPDATE_REQUEST, for a total of 900 seconds, but the subscriber only used 700 seconds?

The S-CSCF sends a final Credit Control Request, this time with type TERMINATION_REQUEST and lets the OCS know via the Used-Service-Unit AVP, how many units the subscriber actually used (700 seconds), meaning the OCS will refund the balance for the gap of 200 seconds the subscriber didn’t use.

If this were the interface for online charging of data, we’d have the PS-Information AVP, or for online charging of SMS we’d have the SMS-Information, and so on.

The architecture and framework for how the charging works doesn’t change between a voice call, data traffic or messaging, just the particulars for the type of service we need to bill, as defined in the Service Information AVP, and the OCS making a decision on that based on if the subscriber should be granted service, and if yes, how many units of whatever type.

Open5GS without NAT

While most users of Open5GS EPC will use NAT on the UPF / P-GW-U but you don’t have to.

While you can do NAT on the machine that hosts the PGW-U / UPF, you may find you want to do the NAT somewhere else in the network, like on a router, or something specifically for CG-NAT, or you may want to provide public addresses to your UEs, either way the default config assumes you want NAT, and in this post, we’ll cover setting up Open5GS EPC / 5GC without NAT on the P-GW-U / UPF.

Before we get started on that, let’s keep in mind what’s going to happen if we don’t have NAT in place,

Traffic originating from users on our network (UEs / Subscribers) will have the from IP Address set to that of the UE IP Pool set on the SMF / P-GW-C, or statically in our HSS.

This will be the IP address that’s sent as the IP Source for all traffic from the UE if we don’t have NAT enabled in our Core, so all external networks will see that as the IP Address for our UEs / Subscribers.

The above example shows the flow of a packet from UE with IP Address sending something to

This is all well and good for traffic originating from our 4G/5G network, but what about traffic destined to our 4G/5G core?

Well, the traffic path is backwards. This means that our router, and external networks, need to know how to reach the subnet containing our UEs. This means we’ve got to add static routes to point to the IP Address of the UPF / P-GW-U, so it can encapsulate the traffic and get the GTP encapsulated traffic to the UE / Subscriber.

For our example packet destined for, as that is a globally routable IP (Not an internal IP) the router will need to perform NAT Translation, but for internal traffic within the network (On the router) the static route on the router should be able to route traffic to the UE Subnets to the UPF / P-GW-U’s IP Address, so it can encapsulate the traffic and get the GTP encapsulated traffic to the UE / Subscriber.

Setting up static routes on your router is going to be different on what you use, in my case I’m using a Mikrotik in my lab, so here’s a screenshot from that showing the static route point at my UPF/P-GW-U. I’ve got BGP setup to share routes around, so all the neighboring routers will also have this information about how to reach the subscriber.

Next up we’ve got to setup IPtables on the server itself running our UPF/P-GW-U, to route traffic addressed to the UE and encapsulate it.

sudo ip route add dev ogstun
sudo echo 1 > /proc/sys/net/ipv4/ip_forward
sudo iptables -A FORWARD -i ogstun -o osgtun -s -d -j ACCEPT

And that’s it, now traffic coming from UEs on our UPF/P-GW will leave the NIC with their source address set to the UE Address, and so long as your router is happily configured with those static routes, you’ll be set.

If you want access to the Internet, it then just becomes a matter of configuring traffic from that subnet on the router to be NATed out your external interface on the router, rather than performing the NAT on the machine.

In an upcoming post we’ll look at doing this with OSPF and BGP, so you don’t need to statically assign routes in your routers.

Diameter – Insert Subscriber Data Request / Response

While we’ve covered the Update Location Request / Response, where an MME is able to request subscriber data from the HSS, what about updating a subscriber’s profile when they’re already attached? If we’re just relying on the Update Location Request / Response dialog, the update to the subscriber’s profile would only happen when they re-attach.

We need a mechanism where the HSS can send the Request and the MME can send the response.

This is what the Insert Subscriber Data Request/Response is used for.

Let's imagine we want to allow a subscriber to access an additional APN, or change an AMBR values of an existing APN;

We'd send an Insert Subscriber Data Request from the HSS, to the MME, with the Subscription Data AVP populated with the additional APN the subscriber can now access.

Beyond just updating the Subscription Data, the Insert Subscriber Data Request/Response has a few other funky uses.

Through it the HSS can request the EPS Location information of a Subscriber, down to the TAC / eNB ID serving that subscriber. It’s not the same thing as the GMLC interfaces used for locating subscribers, but will wake Idle UEs to get their current serving eNB, if the Current Location Request is set in the IDR Flags.

But the most common use for the Insert-Subscriber-Data request is to modify the Subscription Profile, contained in the Subscription-Data AVP,

If the All-APN-Configurations-Included-Indicator is set in the AVP info, then all the existing AVPs will be replaced, if it’s not then everything specified is just updated.

The Insert Subscriber Data Request/Response is a bit novel compared to other S6a requests, in this case it’s initiated by the HSS to the MME (Like the Cancel Location Request), and used to update an existing value.

PS Data Off

Imagine a not-too distant future, one without flying cars – just one where 2G and 3G networks have been switched off.

And the imagine a teenage phone user, who has almost run out of their prepaid mobile data allocation, and so has switched mobile data off, or a roaming scenario where the user doesn’t want to get stung by an unexpectedly large bill.

In 2G/3G networks the Circuit Switched (Voice & SMS) traffic was separate to the Packet Switched (Mobile Data).

This allowed users to turn of mobile data (GPRS/HSDPA), etc, but still be able to receive phone calls and send SMS, etc.

With LTE, everything is packet switched, so turning off Mobile Data would cut off VoLTE connectivity, meaning users wouldn’t be able to make/recieve calls or SMS.

In 3GPP Release 14 (2017) 3GPP introduced the PS Data Off feature.

This feature is primarily implemented on the UE side, and simply blocks uplink user traffic from the UE, while leaving other background IP services, such as IMS/VoLTE and MMS, to continue working, even if mobile data is switched off.

The UE can signal to the core it is turning off PS Data, but it’s not required to, so as such from a core perspective you may not know if your subscriber has PS Data off or not – The default APN is still active and in the implementations I’ve tried, it still responds to ICMP Pings.

IMS Registration stays in place, SMS and MMS still work, just the UE just drops the requests from the applications on the device (In this case I’m testing with an Android device).

What’s interesting about this is that a user may still find themselves consuming data, even if data services are turned off. A good example of this would be push notifications, which are sent to the phone (Downlink data). The push notification will make it to the UE (or at least the TCP SYN), after all downlink services are not blocked, however the response (for example the SYN-ACK for TCP) will not be sent. Most TCP stacks when ignored, try again, so you’ll find that even if you have PS Data off, you may still use some of your downlink data allowance, although not much.

The SIM EF 3GPPPSDATAOFF defines the services allowed to continue flowing when PS Data is off, and the 3GPPPSDATAOFFservicelist EF lists which IMS services are allowed when PS Data is off.

Usually at this point, I’d include a packet capture and break down the flow of how this all looks in signaling, but when I run this in my lab, I can’t differentiate between a PS Data Off on the UE and just a regular bearer idle timeout… So have an irritating blinking screenshot instead…

Docker & BIND as an ENUM Playground

In the last we covered what ENUM is and how it works, so to take this into a more practical example, I thought I’d share the details of the ENUM server I’ve setup in my lab, and the Docker container I’ve bundled it into.

Inside the Docker container we’ll be running Bind – this post won’t teach you much about Bind, there’s already lots of good information on it elsewhere, but we will cover the parameters involved in setting up ENUM records (NAPTR) for E.164 addresses.

Getting the Environment up and Running

First we’ll need to setup our environment, I’ve published the images for the container to Dockerhub, but we’ll build it from the Dockerfile so you can edit the files and rebuild as you play around:

git clone
cd ENUM_Playground
docker build --pull --rm -f "Dockerfile" -t enum:latest "."

systemd-resolve on Ubuntu binds to port 53 by default, which can lead to some headaches, so we’ll create a new network in Docker for this to run in, so it doesn’t conflict with anything else you may be running:

sudo docker network create --subnet= enum_playground

And now we’ll run the ENUM container in the enum_playground network and with the IP,

docker run -d --rm --name=enum --net=enum_playground --ip= enum

Ok, that’s the environment setup, let’s run some queries!

E.164 to SIP URI Resolution with ENUM

In our last post we covered the basics of formatting an E.164 number and querying a DNS server to get it’s call routing information.

Again we’re going to use Dig to query this information. In reality ENUM queries would be run by an endpoint, or software like FreeSWITCH or Kamailio (Spoiler alert, posts on ENUM handling in those coming later), but as we’re just playing Dig will work fine.

So let’s start by querying a single E.164 address, +61355500911

First we’ll reverse it and put full stops / periods between the numbers, to get

Next we’ll add the prefix, which is the global prefix for ENUM addresses, and presto, that’s what we’ll query –

Lastly we’ll feed this into a Dig query against the IP of our container and of type NAPTR,

dig @ -t naptr

So what did you get back?

Well, if everything is working your output should look something like the output I’ve got below,

NAPTR results for queried ENUM Address

So how do we interpret this? Well let’s break it down,

The first part is the domain we queried, simple enough in this case, 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Next up is the TTL or expiry, in this case it’s 3600 seconds (1 hour), shorter periods allow for changes to propagate / be reflected more quickly but at the expense of more load as results can’t be cached for as long. The class (IN) represents Internet, which is the only class commonly used, even on internal systems. 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Then we have the type of record returned, in our case it’s a NAPTR record, 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

After that is the Order, this defines the order in which the rules are to be parsed. Lower numbers are processed first, if no matches then the next lowest, and so on until the highest number is reached, we’ll touch on this in more detail later in this post, 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

The Pref is the processing preference. This is very handy for load balancing, as we can split traffic between hosts with different preferences. We’ll cover this later in this post too. 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

The Flags represent the type of record we’re going to get, for most ENUM traffic this is going to be set to U, to denote a SIP URI with Regex, while the Service value we’ll be looking for will be “E2U+sip” service to identify SIP URIs to route calls to, but could be other values like Email addresses, IM Addresses or PSTN numbers, to be parsed by other applications. 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Lastly we’ve got the Regex part. Again not going to cover Regex as a whole, just the DNS particulars.

Everything between the first and second ! denotes what we’re searching for, while everything from the second ! to the last ! denotes what we replace it with.

In the below example that means we’re matching ^.* which means starting with (^) any character (.) zero or more times (*), which gets replaced with sip:[email protected], 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

How should this be treated?

For the first example, a call to the E.164 address of 61355500912 will be first formatted into a domain as per the ENUM requirements ( and then queried as a NAPTR record against the DNS server, 3600 IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Only a single record has been returned so we don’t need to worry about the Order or Preference, and the Regex matches anything and replaces it with the resulting SIP URI of sip:61[email protected], which is where we’ll send our INVITE.

Under the Hood

Inside the Repo we cloned earlier, if you open the file, things will look somewhat familiar,

The record we just queried is the first example in the Bind config file,

; E.164 Address +61355500911 - Simple no replacement (Resolves all traffic to sip:[email protected]) IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

The config file is just the domain, class, type, order, preference, flags, service and regex.

Astute readers may have noticed the trailing . which where we can put a replacement domain if Regex is not used, but it cannot be used in conjunction with Regex, so for all our work it’ll just be a single trailing . on each line.

You can (and probably should) change the values in the file as we go along to try everything out, you’ll just need to rebuild the container and restart it each time you make a change.

This post is going to focus on Bind, but the majority of modern DNS servers support NAPTR records, so you can use them for ENUM as well, for example I manage the DNS for this site thorough Cloudflare, and I’ve put a screenshot below of an example private ENUM address I’ve added into it.

Setting up a NAPTR record in Cloudflare DNS

Preference to Split Traffic between Servers

So with a firm understanding of a single record being returned, let’s look at how we can use ENUM to cleverly route traffic to multiple hosts.

If we have a pool of servers we may wish to evenly distribute all traffic across them, so that’s how E.164 address +61355500912 is setup – to route traffic evenly (50/50) across two servers.

Querying it with Dig provides the following result:

dig @ -t naptr
;; ANSWER SECTION: 3600 IN NAPTR  10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" . 3600 IN NAPTR  10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

So as the order value (10) is the same for both records, we can ignore it – there isn’t one value lower than the other.

We can see both records have a preference of 100, in practice, this means they each get 50% of the traffic. The formula for traffic distribution is pretty simple, each server gets the value of it’s preference, divided by the total of all the preferences,

So for server1 it’s preference is 100 and the total of all the preferences combined is 200, so it gets 100/200, which is equivalent to one half aka 50%.

We might have a scenario where we have 3 servers, but one is significantly more powerful than the others, so let’s look at giving more traffic to one server and less to others, this example gets a little more complex but should cement your understanding of how the preference works;

dig @ -t naptr 3600 IN NAPTR  10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" . 3600 IN NAPTR  10 200 "u" "E2U+sip" "!^.*$!sip:[email protected]!" . 3600 IN NAPTR  10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

So now 3 servers, again none have a lower order than the other, it’s set to 10 for them all so we can ignore the order,

Next we can see the total of all the priority values is 400,

Server 2 has a priority of 100 so it gets 100/400 total priority, or a quarter of all traffic. Server 1 has the same value, so also gets a quarter of all traffic,

Server 3 however has a priority of 200 so it gets 200/400, or to simplify half of all traffic.

The Bind config for this is:

; E.164 Address +61355500913 - More complex load balance between 3 hosts (25% server1, 25% server2, 50% server3) IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" . IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" . IN NAPTR 10 200 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Order for Failover

Primarily the purpose of the order is to enable wildcard routes (as we’ll see later) to be overwritten by more specific routes, but a secondary use in some implementations use Order as a way to list the preferences of the SIP URIs to route to. For example we could have two servers, one a primary and the other a standby, with the standby only to be used only if the primary SIP URI was not responding.

E.164 number +61355500914 is setup to return two SIP URIs,

dig @ -t naptr
;; ANSWER SECTION: 3600 IN NAPTR  10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" . 3600 IN NAPTR  20 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Our DNS client will first use the SIP URI sip:[email protected] as it has the lower order value (10), and if that fails, can try the entry with the next lowest order-value (20) which would be sip:[email protected].

The Bind config for this is:

; E.164 Address +61355500914 - Order example returning multiple SIP URIs to try for failover IN NAPTR 10 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" . IN NAPTR 20 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .


If we have a 1,000 number block, having to add 1000 individual records can be very tedious. Instead we can use wildcard matching (thanks to the fact we’ve reversed the E.164 address) to match ranges. For example if we have E.164 numbers from +61255501000 to +61255501999 we can add a wildcard entry to match the +61255501x prefix,

I’ve set this up already so let’s lookup the E.164 number +6125501234,

dig @ -t naptr
;; ANSWER SECTION: 3600 IN NAPTR  50 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

If you look up any other number starting with +6125501 you’ll get the same result, and here’s the Bind config for it:

; Wildcard E.164 Address +61255501* - Wildcard example for all destinations starting with E.164 prefix +61255501x to single destination (sip:[email protected])
; For example E.164 number +6125501234 will resolve to sip:[email protected]
*. IN NAPTR 100 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

The catch with this is they’re all pointing at the same SIP URI, so we can’t treat the calls differently based on the called number – This is where the Regex magic comes in.

We can use group matching to match a group and fill it in the dialed number into the SIP Request URI, for example:

!(^.*$)!sip:+1\[email protected]!

Will match the E.164 number requested and put it inside sip:[email protected]

The +61255502xxx prefix is setup for this, so if we query +61255502000 (or any other number between +61255502000 and +61255502999) we’ll get the regex query in the resulting record.

Keep in mind DNS doesn’t actually apply the Regex transformation, just shares it, and the client applies the transformation.

dig @ -t naptr
;; ANSWER SECTION: 3600 IN NAPTR  100 100 "u" "E2U+sip" "!(^.*$)!sip:+1\[email protected]!" .

And the corresponding Bind config:

; Wildcard example for all destinations starting with E.164 prefix +61255502x to regex filled destination
; For example a request to 61255502000 will return sip:[email protected])
*. IN NAPTR 100 100 "u" "E2U+sip" "!(^.*$)!sip:+1\\[email protected]!" .

One last thing to keep in mind, is that Wildcard priorities are of any length.
This means +612555021 would match as well as +6125550299999999999999. Typically terminating switches drop any superfluous digits, and NU those that are too short, but keep this in mind, that length is not taken into account.

Wildcard Priorities

So with our wildcards in place, what if we wanted to add an exception, for example one number in our 61255502xxx block of numbers gets ported to another carrier and needs to be routed elsewhere?

Easy, we just add another entry for that number being more specific and with a lower order than the wildcard, which is what’s setup for E.164 number +61255502345,

dig @ -t naptr
;; ANSWER SECTION: 3600 IN NAPTR  50 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

Which does not return the same result as the others that match the wildcard,

Bind config:

; Wildcard example for all destinations starting with E.164 prefix +61255502x to regex filled destination
; For example a request to +61255502000 will return sip:[email protected])
*. IN NAPTR 100 100 "u" "E2U+sip" "!(^.*$)!sip:+1\\[email protected]!" .

; More specific example with lower order than +6125550x wildcard for E.164 address +61255502345 will return sip:[email protected] IN NAPTR 50 100 "u" "E2U+sip" "!^.*$!sip:[email protected]!" .

We can combine all of the tricks we’ve covered here, from statically defined entries, wildcards, regex replacement, multiple entries with multiple orders and preferences, to create really complex routing, using only DNS.

Summary & Next Steps

So by now hopefully you’ve got a fair understanding of how NAPTR and DNS work together to translate E.164 addresses into SIP URIs,

Of course being able to do this manually with Dig and comprehend how it’ll route is only one part of the picture, in the next posts we’ll cover using Kamailio and FreeSWITCH to query ENUM routing information and route traffic to it,

Looking inside the MMS Exchange (With call flow and PCAP)

So you want to send an MMS?

We’ve covered SMS in the past, but MMS is a different kettle of fish.

Let’s look at how the call flow goes, when Bob wants to send a picture to Alice.

Before Bob sends the MMS, his phone will have to be setup with the correct settings to send MMS.
Sometimes this is done manually, for others it’s done through the Carrier provisioning SMS that preloads the settings, and for others it’s baked in based on the Android Carrier settings XML,

APN settings for Telstra in Australia for MMS

It’s made up of the APN to send MMS traffic over, the MMSC address (Multimedia Message Switching Center) and often an MMS proxy and port combination for where the traffic will actually go.

Message Flow – Bob to MMSC (Mobile Originated MMS)

Bob opens his phone, creates a new message to Alice, selects the picture (or other multimedia filetype) to send to her and hits the send button.

For starters, MMS has a file size limit, like MTU it’s not advertised, so you don’t know if you’ve hit it, so rather like MTU is a “lowest has the highest success of getting through” rule. So Bob’s phone will most likely scale the image down to fit inside 300K.

Next Bob’s phone knows it has an MMS to send, for this is opens up a new bearer on the MMS APN, typically called MMS, but configured in the phone by Bob.

Why use a separate APN for sending 300K of MMS traffic?
Once upon a time mobile data was expensive.
By having a separate APN just for MMS traffic (An APN that could do nothing except send / receive MMS) allowed easier billing / tariffing of data, as MMS traffic was sent over a APN which was unmetered.

After the bearer is setup on the MMS APN, Bob’s phone begins crafting a HTTP 1.1 Post to be sent to the MMSC.
The content type of this request will be application/vnd.wap.mms-message and the body of the HTTP post will be made up of MMS Message Encapsulation, with the body containing the picture he wants to send to Alice.

Note: Historically Wireless Session Protocol (WSP) was used in lieu of HTTP. These clients would now need a WAP gateway to translate into HTTP.

This HTTP Post is then sent to the MMSC Address, or, if present, the MMSC Proxy address.
This traffic is sent over the MMS APN that we just brought up.

HTTP POST Headers for the MO MMS Message

MMS Message Encapsulation from MO MMS Message

The MMSC receives this information, and then, if all was successful, responds with a 200 OK,

200 OK response to MO MMS Message

So now the MMSC has the information from Bob, let’s flip over to Alice.

Message Flow – MMSC to Alice (Mobile Terminated MMS)

For the purposes of simplicity, we’re going to rule out the MMSC from doing clever things like converting the media, accepting email (SMPP) as MMS, etc, etc. Instead we’re going to assume Alice and Bob are on the same Network, and our MMSC is just doing store-and-forward.

The MMSC will look at the To address in the MMS Message Encapsulation of the request Bob sent, to determine that this message is destined for Alice.

The MMSC will load the media content (photo) sent by Bob destined for Alice and serve it via HTTP. The MMSC generates a random URL to serve it this particular file on, with each MMS the MMSC handles being assigned a random URL containing the media content.

Next the MMSC will need to tell Alice’s phone, that she has an MMS waiting for her. This is done by generating an SMS to send to Alice’s phone,

The user-data of this SMS is the Wireless Session Protocol with the method PUSH – Aka WAP Push.

SMS alerting the user of an MMS waiting for delivery

This specially encoded SMS is parsed by the Alice’s phone, which tells the her there is an MMS message waiting for her.

On some operating systems this is pulled automatically, on others, users need to select “Download” to actually get the file.

The UE then just runs an HTTP get to the address in the X-Mms-Content-Location: Header to pull the multimedia content that Bob sent.

HTTP GET from Alice’s Phone / UE to retrieve MMS sent by Bob (MT-MMS)

All going well the URL is valid and Alice’s phone retrieves the message, getting a 200 OK back from the server with the message content.

HTTP Response (200 OK) for MT-MMS, sent by the MMSC to Alice’s phone with the MMS Body

So now Alice’s phone has the MMS content and renders it on the screen, Alice can see the Photo Bob sent her.

Lastly Alice’s phone sends a HTTP POST again to the MMSC, this time indicating the message status is “Retrieved”,

And to close everything off the MMSC confirms receipt of the Retrieved status with a 200 OK, and we are done.

What didn’t we cover?

So that’s a basic MMS message flow, but there’s a few parts we didn’t cover.

The overall architecture beyond just the store-and forward behaviour, charging and authentication we didn’t cover. So let’s look at each of these points.

Overall Architecture

What we just covered what what’s defined as the MM1 interface.

There’s obviously a stack of other interfaces, such as for charging, messaging between MMSC/Carriers, subscriber locating / user database, etc.


MMSCs would typically have a connection to trigger charging events / credit-control events prior to processing the message.

For online charging the Ro interface can be used, as you would for IMS charging events.

3GPP 3GPP TS 32.270 covers the charging architecture for online/offline charging for MMS.


Unfortunately authentication was a bit of an afterthought for the MMS standard, and can be done several different ways.

The most common is to correlate the IP Address on the MMS APN against a subscriber.

Telephony binary-coded decimal (TBCD) in Python with Examples

Chances are if you’re reading this, you’re trying to work out what Telephony Binary-Coded Decimal encoding is. I got you.

Again I found myself staring at encoding trying to guess how it worked, reading references that looped into other references, in this case I was encoding MSISDN AVPs in Diameter.

How to Encode a number using Telephony Binary-Coded Decimal encoding?

First, Group all the numbers into pairs, and reverse each pair.

So a phone number of 123456, becomes:


Because 1 & 2 are swapped to become 21, 3 & 4 are swapped to become 34, 5 & 6 become 65, that’s how we get that result.

TBCD Encoding of numbers with an Odd Length?

If we’ve got an odd-number of digits, we add an F on the end and still flip the digits,

For example 789, we add the F to the end to pad it to an even length, and then flip each pair of digits, so it becomes:


That’s the abbreviated version of it. If you’re only encoding numbers that’s all you’ll need to know.

Detail Overload

Because the numbers 0-9 can be encoded using only 4 bits, the need for a whole 8 bit byte to store this information is considered excessive.

For example 1 represented as a binary 8-bit byte would be 00000001, while 9 would be 00001001, so even with our largest number, the first 4 bits would always going to be 0000 – we’d only use half the available space.

So TBCD encoding stores two numbers in each Byte (1 number in the first 4 bits, one number in the second 4 bits).

To go back to our previous example, 1 represented as a binary 4-bit word would be 0001, while 9 would be 1001. These are then swapped and concatenated, so the number 19 becomes 1001 0001 which is hex 0x91.

Let’s do another example, 82, so 8 represented as a 4-bit word is 1000 and 2 as a 4-bit word is 0010. We then swap the order and concatenate to get 00101000 which is hex 0x28 from our inputted 82.

Final example will be a 3 digit number, 123. As we saw earlier we’ll add an F to the end for padding, and then encode as we would any other number,

F is encoded as 1111.

1 becomes 0001, 2 becomes 0010, 3 becomes 0011 and F becomes 1111. Reverse each pair and concatenate 00100001 11110011 or hex 0x21 0xF3.

Special Symbols (#, * and friends)

Because TBCD Encoding was designed for use in Telephony networks, the # and * symbols are also present, as they are on a telephone keypad.

Astute readers may have noticed that so far we’ve covered 0-9 and F, which still doesn’t use all the available space in the 4 bit area.

The extended DTMF keys of A, B & C are also valid in TBCD (The D key was sacrificed to get the F in).

Symbol4 Bit Word
*1 0 1 0
#1 0 1 1
a1 1 0 0
b1 1 0 1
c1 1 1 0

So let’s run through some more examples,

*21 is an odd length, so we’ll slap an F on the end (*21F), and then encoded each pair of values into bytes, so * becomes 1010, 2 becomes 0010. Swap them and concatenate for our first byte of 00101010 (Hex 0x2A). F our second byte 1F, 1 becomes 0001 and F becomes 1111. Swap and concatenate to get 11110001 (Hex 0xF1). So *21 becomes 0x2A 0xF1.

And as promised, some Python code from PyHSS that does it for you:

    def TBCD_special_chars(self, input):
        if input == "*":
            return "1010"
        elif input == "#":
            return "1011"
        elif input == "a":
            return "1100"
        elif input == "b":
            return "1101"
        elif input == "c":
            return "1100"      
            print("input " + str(input) + " is not a special char, converting to bin ")
            return ("{:04b}".format(int(input)))

    def TBCD_encode(self, input):
        print("TBCD_encode input value is " + str(input))
        offset = 0
        output = ''
        matches = ['*', '#', 'a', 'b', 'c']
        while offset < len(input):
            if len(input[offset:offset+2]) == 2:
                bit = input[offset:offset+2]    #Get two digits at a time
                bit = bit[::-1]                 #Reverse them
                #Check if *, #, a, b or c
                if any(x in bit for x in matches):
                    new_bit = ''
                    new_bit = new_bit + str(TBCD_special_chars(bit[0]))
                    new_bit = new_bit + str(TBCD_special_chars(bit[1]))    
                    bit = str(int(new_bit, 2))
                output = output + bit
                offset = offset + 2
                bit = "f" + str(input[offset:offset+2])
                output = output + bit
                print("TBCD_encode output value is " + str(output))
                return output

    def TBCD_decode(self, input):
        print("TBCD_decode Input value is " + str(input))
        offset = 0
        output = ''
        while offset < len(input):
            if "f" not in input[offset:offset+2]:
                bit = input[offset:offset+2]    #Get two digits at a time
                bit = bit[::-1]                 #Reverse them
                output = output + bit
                offset = offset + 2
            else:   #If f in bit strip it
                bit = input[offset:offset+2]
                output = output + bit[1]
                print("TBCD_decode output value is " + str(output))
                return output

The PLMN Problem for Private LTE / 5G

So it’s the not to distant future and the pundits vision of private LTE and 5G Networks was proved correct, and private networks are plentiful.

But what PLMN do they use?

The PLMN (Public Land Mobile Network) ID is made up of a Mobile Country Code + Mobile Network Code. MCCs are 3 digits and MNCs are 2-3 digits. It’s how your phone knows to connect to a tower belonging to your carrier, and not one of their competitors.

For example in Australia (Mobile Country Code 505) the three operators each have their own MCC. Telstra as the first licenced Mobile Network were assigned 505/01, Optus got 505/02 and VHA / TPG got 505/03.

Each carrier was assigned a PLMN when they started operating their network. But the problem is, there’s not much space in this range.

The PLMN can be thought of as the SSID in WiFi terms, but with a restriction as to the size of the pool available for PLMNs, we’re facing an IPv4 exhaustion problem from the start if we’re facing an explosion of growth in the space.

Let’s look at some ways this could be approached.

Everyone gets a PLMN

If every private network were to be assigned a PLMN, we’d very quickly run out of space in the range. Best case you’ve got 3 digits, so only space for 1,000 networks.

In certain countries this might work, but in other areas these PLMNs may get gobbled up fast, and when they do, there’s no more. New operators will be locked out of the market.

Loaner PLMNs

Carriers already have their own PLMNs, they’ve been using for years, some kit vendors have been assigned their own as well.

If you’re buying a private network from an existing carrier, they may permit you to use their PLMN,

Or if you’re buying kit from an existing vendor you may be able to use their PLMN too.

But what happens then if you want to move to a different kit vendor or another service provider? Do you have to rebuild your towers, reconfigure your SIMs?

Are you contractually allowed to continue using the PLMN of a third party like a hardware vendor, even if you’re no longer purchasing hardware from them? What happens if they change their mind and no longer want others to use their PLMN?

Everyone uses 999 / 99

The ITU have tried to preempt this problem by reallocating 999/99 for use in Private Networks.

The problem here is if you’ve got multiple private networks in close proximity, especially if you’re using CBRS or in close proximity to other networks, you may find your devices attempting to attach to another network with the same PLMN but that isn’t part of your network,

Mobile Country or Geographical Area Codes
Note from TSB
Following the agreement on the Appendix to Recommendation ITU-T E.212 on “shared E.212 MCC 999 for internal use within a private network” at the closing plenary of ITU-T SG2 meeting of 4 to 13 July 2018, upon the advice of ITU-T Study Group 2, the Director of TSB has assigned the Mobile Country Code (MCC) “999” for internal use within a private network. 

Mobile Network Codes (MNCs) under this MCC are not subject to assignment and therefore may not be globally unique. No interaction with ITU is required for using a MNC value under this MCC for internal use within a private network. Any MNC value under this MCC used in a network has
significance only within that network. 

The MNCs under this MCC are not routable between networks. The MNCs under this MCC shall not be used for roaming. For purposes of testing and examples using this MCC, it is encouraged to use MNC value 99 or 999. MNCs under this MCC cannot be used outside of the network for which they apply. MNCs under this MCC may be 2- or 3-digit.

(Recommendation ITU-T E.212 (09/2016))

The Crystal Ball?

My bet is we’ll see the ITU allocate an MCC – or a range of MCCs – for private networks, allowing for a pool of PLMNs to use.

When deploying networks, Private network operators can try and pick something that’s not in use at the area from a pool of a few thousand options.

The major problem here is that there still won’t be an easy way to identify the operator of a particular network; the SPN is local only to the SIM and the Network Name is only present in the NAS messaging on an attach, and only after authentication.

If you’ve got a problem network, there’s no easy way to identify who’s operating it.

But as eSIMs become more prevalent and BIP / RFM on SIMs will hopefully allow operators to shift PLMNs without too much headache.

How UEs get Time in LTE

You may have noticed in the settings on your phone the time source can be set to “Network”, but what does this actually entail and how is this information transferred?

The answer is actually quite simple,

In the NAS PDU of the Downlink NAS Transport message from the MME to the UE, is the Time Zone & Time field, which contains (unsuprisingly) the Timezone and Time.

Time is provided in UTC form with the current Timezone to show the offset.

This means that in the configuration for each TAC on your MME, you have to make sure that the eNBs in that TAC have the Timezone set for the location of the cells in that TAC, which is especially important when working across timezones.

There is no parameter for the date/time when Daylight savings time may change. But as soon as a UE goes Idle and then comes out of Idle mode, it’ll be given the updated timezone information, and during handovers the network time is also provided.
This means if you were using your phone at the moment when DST begins / ends you’d only see the updated time once the UE toggles into/out of Idle mode, or when performing a tracking-area update.

Diameter Agents

Let’s take a look at each of the common Diameter agent variants in use today:

Diameter Relay Agent / Diameter Routing Agent (DRA)

This is the simplest of the Diameter agents, but also probably the most common. The Diameter Relay agent does not look at the contents of the AVPs, it just routes messages based on the Application ID or Destination realm.

A Diameter Relay Agent does not change any AVPs except routing AVPs.

DRAs are transaction aware, but not dialog aware. This means they know if the Diameter request made it to the destination, but have no tracking of getting a response.

DRAs are common as a central hub for all Diameter hub in a network. This allows for a star topology where every Diameter service connects to a central DRA (typically two DRAs for redundancy) for a central place to manage Diameter routing, instead of having to do a full-mesh topology, which would be a nightmare on larger networks.

I recently wrote about creating a simple but unstable DRA with Kamailio.

Diameter Edge Agent

A Diameter Edge Agent is a special DRA that sits on the border between two networks and acts as a gateway between them.

Imagine a roaming exchange scenario, where each operator has to expose their core Diameter servers or DRAs to all the other operators they have roaming agreements with. Like we saw with the DRA to do a full-mesh style connection arrangement would be a mess, and wouldn’t allow internal changes inside the network without significant headaches.

Instead by putting a Diameter Edge Agent at the edge of the network, the operators who wish to access our Diameter information for roaming, only need to connect to a single point, and we can change whatever we like on the inside of the network, adding and removing servers, without having to update our roaming information (IR 21).

We can also strictly enforce security policies on rate limits and admission control, centrally, for all connections in from other operators.

Diameter Proxy Agent

The Diameter Proxy Agent does everything a DRA does, and more!

The Diameter Proxy Agent is application aware, meaning it can decode the AVPs and make decisions based upon the contents of the AVPs. It’s also able to edit / add / delete AVPs and Sub-AVPs.

These are useful for interconnect scenarios where you might need to re-write the value of an AVP, or translate a realm etc, on a Diameter request/response journey.

Diameter Translation Agent

Diameter Translation agents are used for translating between protocols, for example Diameter into MAP for GSM authentication, or into HTTP for 5G authentication.

For 5GC a new network element – the “Binding Support Function” (BSF) is introduced to translate between HTTP for 5G and Diameter for LTE, however this can be thought of as another Diameter Translation Agent.

SCTP Parameter Tuning

There’s a lot to like about SCTP. No head of line blocking, MTU issues, sequenced, acknowledged delivery of messages, not to mention Multi-Homing and message bundling.

But if you really want to get the most bang for your buck, you’ll need to tune your SCTP parameters to match the network conditions.

While tuning the parameters per-association would be time consuming, most SCTP stacks allow you to set templates for SCTP parameters, for example you would have a different set of parameters for the SCTP stacks inside your network, compared to SCTP stacks for say a roaming scenario or across microwave links.

IETF kindly provides a table with their recommended starting values for SCTP parameter tuning:

RTO.Initial3 seconds
RTO.Min1 second
RTO.Max60 seconds
Valid.Cookie.Life60 seconds
Association.Max.Retrans10 attempts
Path.Max.Retrans5 attempts (per destination address)
Max.Init.Retransmits8 attempts
HB.interval30 seconds
IETF – RFC4960: SCTP – Suggested Protocol Parameter Values

But by adjusting the Max Retrans and Retransmission Timeout (RTO) values, we can detect failures on the network more quickly, and reduce the number of packets we’ll loose should we have a failure.

We begin with the engineered round-trip time (RTT) – that is made up of the time it takes to traverse the link, processing time for the remote SCTP stack and time for the response to traverse the link again. For the examples below we’ll take an imaginary engineered RTT of 200ms.

RTO.min is the minimum retransmission timeout.
If this value is set too low then before the other side has had time to receive the request, process it and send a response, we’ve already retransmitted it.

This should be set to the round trip delay plus processing needed to send and acknowledge a packet plus some allowance for variability due to jitter; a value of 1.15 times the Engineered RTT is often chosen

So for us, 200 * 1.15 = 230ms RTO.min value.

RTO.max is the maximum amount of time we should wait before transmitting a request.
Typically three times the Engineered RTT.

So for us, 200 * 3 = 600ms RTO.min value.

Path.Max.Retransmissions is the maximum number of retransmissions to be sent down a path before the path is considered to be failed.
For example if we loose a transmission path on a multi-homed server, how many retransmissions along that path should we send until we consider it to be down?

Values set are dependant on if you’re multi-homing or not (you can be more picky if you are) and the level of acceptable packet loss in your transmission link.

Typical values are 4 Retransmissions (per destination address) for a Single-Homed association, and 2 Retransmissions (per destination address) for a Multi-Homed association.

Association.Max.Retransmissions is the maximum number of retransmissions for an association. If a transmission link in a multi-homed SCTP scenario were to go down, we would pass the Path.Max.Retransmissions value and the SCTP stack would stop sending traffic out that path, and try another, but what if the remote side is down? In that scenario all our paths would fail, so we need another counter – Path.Max.Retransmissions to count the total number of retransmissions to an association / destination. When the Association.Max.Retransmissions is reached the association is considered down.

In practice this value would be the number of paths, multiplied by the Path.Max.Retransmissions.