CGrateS in Baby Steps – Part 5 – Events, Agents & Subsystems

Up until this point in the series, I’ve tried to hide all the complexity of CGrateS, so people following along can see some progress and feel like they’re making it somewhere with CGrateS, but it’s time to tear off the plaster and talk about the actual concepts, about what’s under the hood, and how all the components interact, as it’ll make it much easier then for us to learn more about how to use CGrateS.

This will be the last post in the “CGrateS in Baby Steps” series (Which I started in 2022), if you’ve made it this far congratulations, all the future posts will be on specific topics and build upon the concepts we’ve covered here.

This took me a while to grasp – CGrateS is both crazy complex and beautifully simple, but getting to the stage where you can “see through the matrix” on CGrateS and see the beautiful simplicity involves a bit of understanding how everything fits together.

Once you realize once you can see the pattern, and understand the building blocks, everything else CGrateS related becomes super simple.

Agents

in CGrateS Agents are consumers of the services. That’s a super generic answer, but let’s take a closer look at what that actually means with some examples:

Diameter is a protocol that can be used for Online Charging.
CGrateS has a common interface for API calls that can perform Online Charging.
The CGrateS Diameter Agent translates between Diameter on one side, and CGrateS API calls on the other.

Likewise, if we want to speak Radius, we can use the CGrateS Radius Agent, this translates between RADIUS and the CGrateS API calls.

FreeSWITCH, Asterisk and Kamailio don’t use specific protocols like Diameter or Radius, but rather modules or plugins to connect that application to a CGrateS Agent, and they all just end up talking the same CGrateS API calls.

Lastly, there’s even an HTTP agent so you could define your own agent to talk another protocol if you wanted to use CGrateS for anything else (We’ve been playing with CAMEL based charging with CGrateS and 5GC charging).

The config for each of the CGrateS Agents happens in the cgrates.json config file (Typically in /etc/cgrates).

Because the Agents just translate everything into API calls, logic for billing a call from FreeSWITCH is the same as for Diameter, the same as for RADIUS, the same as for SIP, the same as for Asterisk.

The Agents just translate all the domain-specific stuff into the common CGrateS RPC API, which we’ve been working with up to this point.

This is key part to understand; because once you understand how to do the CGrateS part, moving from Asterisk to FreeSWITCH, to DNS, to RADIUS, to any other Agent, it’s all the same to you.

The Agents just translate domain-specific stuff (Diameter requests, CSV files, Asterisk Calls, FreeSWITCH calls, etc, etc) and act as a translator to translate these requests into CGrateS RPC API calls.

On the left side of the image below are the Agents, and on the right side, the Subsystems that do stuff with things.

Subsystems

So with these API calls, where do they go, what do they do?

Well, it’s the Subsystems that do the things.

What things?

Well, everything of use.

Each subsystem has a purpose, AttributeS transforms stuff, EeS exports CDRs, RALs applies our charging logic, CDRs writes CDRs to StorDB, etc, etc.

In each event we can set flags to denote which subsystems it should be routed to, and we can set the links between components in our cgrates.json file.

Based on the flags, we pass events between these subsystems.

Events

So our Agents create the API calls, which contain Events, which are JSON RPC calls.

They look like all the API examples we’ve played with, because that’s exactly what they are.

We can access them via the JSON RPC API, but when you start a call on Kamailio, the Kamailio Agent generates a JSON RPC API call containing an Event into CGrateS for that call on Kamailio.

When you send a DNS request, the DNS Agent translate this DNS request into a CGrateS JSON RPC API containing the event for the DNS request.

Let’s take an example, we’re going to use the ErS as it’s the simplest to demonstrate with.

I’ve already written about ERS the Event Reader Service, and it reads text files / CSVs so we can import them into CGrateS.

So if you setup your enviroment per the tutorial above (but don’t load the CSV yet), we’ll start running some experiments…

Anatomy of an Event

We can “sniff” the events bouncing around between the Agent and the various Subsystems in real time, by using ngrep:

sudo ngrep -t -W byline port 2012 or port 2080 or port 8021 or port 2014 or port 2053 -d any

So let’ we’ve got ngrep running, we can move our CSV file in to be processed in another tab.

Plonking the CSV file into the path ERS is monitoring will mean the ErS Agent will generate a CGrateS JSON RPC “event” for each row in the file, it’ll look something like this:

#############
T 2024/12/22 09:09:47.357151 127.0.0.1:50456 -> 127.0.0.1:2012 [AP] #404
{"method":"CDRsV1.ProcessEvent","params":[{"Flags":["*rals"],"Tenant":"cgrates.org","ID":"c2ce33d","Time":"2024-12-22T09:09:47.356294838+11:00","Event":{"Account":"61412341234","Animal":"Dog","CGRID":"6330859b7c38c1d508f9e5e0043950079e54fef1","Category":"call","Destination":"61812341234","OriginID":"1lklkjfds","RequestType":"*rated","SetupTime":"2024-01-01 01:00:00","Source":"*sessions","Subject":"61812341234","ToR":"*voice","Usage":"60","Value0":"2024-01-01 01:00:00","Value1":"2024-01-01 01:01:00","Value2":"Nick","Value3":"60","Value4":"61412341234","Value5":"61812341234","Value6":"Dog","Value7":"1lklkjfds"},"APIOpts":{}}],"id":1}

##
T 2024/12/22 09:09:47.357427 127.0.0.1:50456 -> 127.0.0.1:2012 [AP] #406
{"method":"AttributeSv1.ProcessEvent","params":[{"Tenant":"cgrates.org","ID":"c2ce33d","Time":"2024-12-22T09:09:47.356294838+11:00","Event":{"Account":"61412341234","Animal":"Dog","CGRID":"6330859b7c38c1d508f9e5e0043950079e54fef1","Category":"call","Destination":"61812341234","OriginID":"1lklkjfds","RequestType":"*rated","SetupTime":"2024-01-01 01:00:00","Source":"*sessions","Subject":"61812341234","ToR":"*voice","Usage":"60","Value0":"2024-01-01 01:00:00","Value1":"2024-01-01 01:01:00","Value2":"Nick","Value3":"60","Value4":"61412341234","Value5":"61812341234","Value6":"Dog","Value7":"1lklkjfds"},"APIOpts":{"*context":"*cdrs","*subsys":"*cdrs"}}],"id":2}

##
T 2024/12/22 09:09:47.422623 127.0.0.1:2012 -> 127.0.0.1:50456 [AP] #408
{"id":2,"result":null,"error":"NOT_FOUND"}

##
T 2024/12/22 09:09:47.422788 127.0.0.1:50456 -> 127.0.0.1:2012 [AP] #410
{"method":"ChargerSv1.ProcessEvent","params":[{"Tenant":"cgrates.org","ID":"c2ce33d","Time":"2024-12-22T09:09:47.356294838+11:00","Event":{"Account":"61412341234","Animal":"Dog","CGRID":"6330859b7c38c1d508f9e5e0043950079e54fef1","Category":"call","Destination":"61812341234","OriginID":"1lklkjfds","RequestType":"*rated","SetupTime":"2024-01-01 01:00:00","Source":"*sessions","Subject":"61812341234","ToR":"*voice","Usage":"60","Value0":"2024-01-01 01:00:00","Value1":"2024-01-01 01:01:00","Value2":"Nick","Value3":"60","Value4":"61412341234","Value5":"61812341234","Value6":"Dog","Value7":"1lklkjfds"},"APIOpts":{"*context":"*cdrs","*subsys":"*cdrs"}}],"id":3}

##
T 2024/12/22 09:09:47.451702 127.0.0.1:2012 -> 127.0.0.1:50456 [AP] #412
{"id":3,"result":[{"ChargerSProfile":"DEFAULT","AttributeSProfiles":null,"AlteredFields":["*req.RunID"],"CGREvent":{"Tenant":"cgrates.org","ID":"c2ce33d","Time":"2024-12-22T09:09:47.356294838+11:00","Event":{"Account":"61412341234","Animal":"Dog","CGRID":"6330859b7c38c1d508f9e5e0043950079e54fef1","Category":"call","Destination":"61812341234","OriginID":"1lklkjfds","RequestType":"*rated","RunID":"DEFAULT","SetupTime":"2024-01-01 01:00:00","Source":"*sessions","Subject":"61812341234","ToR":"*voice","Usage":"60","Value0":"2024-01-01 01:00:00","Value1":"2024-01-01 01:01:00","Value2":"Nick","Value3":"60","Value4":"61412341234","Value5":"61812341234","Value6":"Dog","Value7":"1lklkjfds"},"APIOpts":{"*context":"*cdrs","*subsys":"*chargers"}}}],"error":null}

#
T 2024/12/22 09:09:47.452021 127.0.0.1:50456 -> 127.0.0.1:2012 [AP] #413
{"method":"Responder.GetCost","params":[{"Category":"call","Tenant":"cgrates.org","Subject":"61812341234","Account":"61412341234","Destination":"61812341234","TimeStart":"2024-01-01T01:00:00+11:00","TimeEnd":"2024-01-01T01:00:00.00000006+11:00","LoopIndex":0,"DurationIndex":60,"FallbackSubject":"","RatingInfos":null,"Increments":null,"ToR":"*voice","ExtraFields":{"Animal":"Dog","Value0":"2024-01-01 01:00:00","Value1":"2024-01-01 01:01:00","Value2":"Nick","Value3":"60","Value4":"61412341234","Value5":"61812341234","Value6":"Dog","Value7":"1lklkjfds"},"MaxRate":0,"MaxRateUnit":0,"MaxCostSoFar":0,"CgrID":"","RunID":"","ForceDuration":false,"PerformRounding":true,"DenyNegativeAccount":false,"DryRun":false,"APIOpts":{"*context":"*cdrs","*subsys":"*chargers"}}],"id":4}

#
T 2024/12/22 09:09:47.465711 127.0.0.1:2012 -> 127.0.0.1:50456 [AP] #414
{"id":4,"result":{"Category":"call","Tenant":"cgrates.org","Subject":"61812341234","Account":"61412341234","Destination":"61812341234","ToR":"*voice","Cost":14,"Timespans":[{"TimeStart":"2024-01-01T01:00:00+11:00","TimeEnd":"2024-01-01T01:01:00+11:00","Cost":14,"RateInterval":{"Timing":{"ID":"*any","Years":[],"Months":[],"MonthDays":[],"WeekDays":[],"StartTime":"00:00:00","EndTime":""},"Rating":{"ConnectFee":0,"RoundingMethod":"*up","RoundingDecimals":4,"MaxCost":0,"MaxCostStrategy":"","Rates":[{"GroupIntervalStart":0,"Value":14,"RateIncrement":60000000000,"RateUnit":60000000000}]},"Weight":10},"DurationIndex":60000000000,"Increments":[{"Duration":0,"Cost":0,"BalanceInfo":{"Unit":null,"Monetary":null,"AccountID":""},"CompressFactor":1},{"Duration":60000000000,"Cost":14,"BalanceInfo":{"Unit":null,"Monetary":null,"AccountID":""},"CompressFactor":1}],"RoundIncrement":null,"MatchedSubject":"*out:cgrates.org:call:*any","MatchedPrefix":"618","MatchedDestId":"Dest_AU_Fixed","RatingPlanId":"RatingPlan_VoiceCalls","CompressFactor":1}],"RatedUsage":60000000000,"AccountSummary":null},"error":null}

#
T 2024/12/22 09:09:47.470035 127.0.0.1:2012 -> 127.0.0.1:50456 [AP] #415
{"id":1,"result":"OK","error":null}

Sidebar – you’re going to spend a lot of time with `ngrep`.

Alright, that event probably looks familiar, after all, it’s the same structure as the API requests we’ve made to CGrateS so far, to set rates and handle accounts.

But what we’re witnessing here isn’t us making an API request to the JSON RPC interface from a Python script, it’s the ERS Agent inside CGrateS, calling CGrateS.

The ERS Agent inside CGrateS reads the CSV file we dropped in, and based on what we had set in the ERS section of the CGrateS config file (cgrates.json), the ERS Agent create JSON RPC events and sent it to CGrateS for processing.

You may be thinking “Wow, the ERS Agent is really dumb, it just sends an API request (events)”, and you’d be right.

We could replace the ERS Agent with a Python script to read the CSV and send the same request, and we’d get the exact same outcome, but CGrateS is mostly “batteries included” so we don’t have to.

Ok, so you’ve heard me drum in the fact that Agents are pretty simple, and all they do is make JSON RPC requests for the event which are sent to CGrateS. So now what happens?

Well, the event is calling CDRsV1.ProcessEvent, so that means the Event is passed by CGRengine to the CDRs subsystem.

What does CDRs subsystem do with it? Well, that’s going to depend on what’s in our cgrates.json config file,

In the above example, CDRs is setup with connections to the different subsystems, AttributeS, Chargers and RALs are all the subsystems linked from here.

Having these links here does not force the Event to always route to these Subsystems, but unless we’ve got the links there, the Event won’t be able to get routed from CDRs to that subsystem if we want it to.

But we can see what’s going to happen with this request based on our CDRsV1.ProcessEvent event, it’s got Flags set to rals, so we know it wants RALs to be called.

Let’s take a closer look at the API call:

{
"method": "CDRsV1.ProcessEvent",
"params": [
{
"Flags": ["*rals"],
"Tenant": "cgrates.org",
"ID": "c2ce33d",
"Time": "2024-12-22T09:09:47.356294838+11:00",
"Event": {
"Account": "61412341234",
"Animal": "Dog",
"CGRID": "6330859b7c38c1d508f9e5e0043950079e54fef1",
"Category": "call",
"Destination": "61812341234",
"OriginID": "1lklkjfds",
"RequestType": "*rated",
"SetupTime": "2024-01-01 01:00:00",
"Source": "*sessions",
"Subject": "61812341234",
"ToR": "*voice",
"Usage": "60"
},
"APIOpts": {}
}
],
"id": 1
}

So looking in ngrep we see our CDRsV1.ProcessExternalCDR event makes it to the CDRs module with ID 1.

The API call has flags set to *rals so the CDRs will call RALs , and inside our config the CDRs section has a link in the config (shown in the image below) to RALs (rals_conns) – if we didn’t have that link, CGrateS wouldn’t know how to connect to RALs, and the event would fail.

We’ve also got connections to AttributeS configured in the config and we can see the RPC call to AttributeS.ProcessEvent which is the same as if we were to call it directly via the AttributeS API.

{
"method": "AttributeSv1.ProcessEvent",
"params": [
{
"Tenant": "cgrates.org",
"ID": "c2ce33d",
"Time": "2024-12-22T09:09:47.356294838+11:00",
"Event": {
"Account": "61412341234",
"Animal": "Dog",
"CGRID": "6330859b7c38c1d508f9e5e0043950079e54fef1",
"Category": "call",
"Destination": "61812341234",
"OriginID": "1lklkjfds",
"RequestType": "*rated",
"SetupTime": "2024-01-01 01:00:00",
"Source": "*sessions",
"Subject": "61812341234",
"ToR": "*voice",
"Usage": "60",
},
"APIOpts": {
"*context": "*cdrs",
"*subsys": "*cdrs"
}
}
],
"id": 2
}

Note at the bottom the APIOpts section tells us this API call was made by the *cdrs subsystem and the ID is 2 (This is a different request to the original CDRsV1.ProcessExternalCDR request which had ID 1 – we can use this to match responses to requests).

Again, because our config also includes links ChargerS and RALS subsystems, we’ll see requests to (you guessed it) ChargerS (The ChargerSv1.ProcessEvent) and RALS (Responder.GetCost).

T 2024/12/22 09:09:47.452021 127.0.0.1:50456 -> 127.0.0.1:2012 [AP] #413
{"method":"Responder.GetCost","params":[{"Category":"call","Tenant":"cgrates.org","Subject":"61812341234","Account":"61412341234","Destination":"61812341234","TimeStart":"2024-01-01T01:00:00+11:00","TimeEnd":"2024-01-01T01:00:00.00000006+11:00","LoopIndex":0,"DurationIndex":60,"FallbackSubject":"","RatingInfos":null,"Increments":null,"ToR":"*voice","ExtraFields":{"Animal":"Dog","Value0":"2024-01-01 01:00:00","Value1":"2024-01-01 01:01:00","Value2":"Nick","Value3":"60","Value4":"61412341234","Value5":"61812341234","Value6":"Dog","Value7":"1lklkjfds"},"MaxRate":0,"MaxRateUnit":0,"MaxCostSoFar":0,"CgrID":"","RunID":"","ForceDuration":false,"PerformRounding":true,"DenyNegativeAccount":false,"DryRun":false,"APIOpts":{"*context":"*cdrs","*subsys":"*chargers"}}],"id":4}

#
T 2024/12/22 09:09:47.465711 127.0.0.1:2012 -> 127.0.0.1:50456 [AP] #414
{"id":4,"result":{"Category":"call","Tenant":"cgrates.org","Subject":"61812341234","Account":"61412341234","Destination":"61812341234","ToR":"*voice","Cost":14,"Timespans":[{"TimeStart":"2024-01-01T01:00:00+11:00","TimeEnd":"2024-01-01T01:01:00+11:00","Cost":14,"RateInterval":{"Timing":{"ID":"*any","Years":[],"Months":[],"MonthDays":[],"WeekDays":[],"StartTime":"00:00:00","EndTime":""},"Rating":{"ConnectFee":0,"RoundingMethod":"*up","RoundingDecimals":4,"MaxCost":0,"MaxCostStrategy":"","Rates":[{"GroupIntervalStart":0,"Value":14,"RateIncrement":60000000000,"RateUnit":60000000000}]},"Weight":10},"DurationIndex":60000000000,"Increments":[{"Duration":0,"Cost":0,"BalanceInfo":{"Unit":null,"Monetary":null,"AccountID":""},"CompressFactor":1},{"Duration":60000000000,"Cost":14,"BalanceInfo":{"Unit":null,"Monetary":null,"AccountID":""},"CompressFactor":1}],"RoundIncrement":null,"MatchedSubject":"*out:cgrates.org:call:*any","MatchedPrefix":"618","MatchedDestId":"Dest_AU_Fixed","RatingPlanId":"RatingPlan_VoiceCalls","CompressFactor":1}],"RatedUsage":60000000000,"AccountSummary":null},"error":null}

What we’re seeing is the CDRs module, calling RALs, to get the cost information for this event.

Finally the CDRsV1.ProcessEvent that was initially sent by ErS gets a result (we can find the result to the request as it’ll have the same id parameter)

So that’s it, that’s the secret sauce – CGrateS is just a bunch of little APIs we combo together to create something great.

Recap

Agents translate data sources into API calls.

Each little API belongs to a Subsystem, like ChargerS, AttributeS or RALs, and we can chain them together in our config file or through the flags in the API request.

Once you’ve got your head wrapped around this, everything in CGrateS becomes way easier.

From now on I’ll pivot to talking about specific modules, and how we use them, starting with AttributeS (which I wrote last year while still drafting this), and diving into how to use each module in more detail.

A tale of two CPRIs

It was the best of times, it was the worst of times. It was the age of wisdom, it was the age of foolishness. It was the epoch of belief, it was the epoch of incredulity. It was the season of Light, it was the season of Darkness. It was the spring of hope, it was the winter of despair.

A tale of two Cities

When Dickens wrote of Doctor Manette in the 1859, I doubt his intention was to write about the repeating history of RAN fronthaul standards – but I can’t really say for sure.

Setting the Scene

Our story starts with introducing CPRI (Common Public Radio Interface) interface, having been imprisoned in the Bastille of vendor lock in for the better part of twenty years.

Think of CPRI is less of a hard interoperable standard and more like how the Italian and French languages are both derived from Latin; it doesn’t mean that the two languages are the same, but they’ve got the same root and may share some common words and structures.

In practice this means that taking an Ericsson Radio and plugging it into a Huawei Baseband simply won’t work – With CPRI you must use the same vendor for the Baseband and the Radios.

Huawei BBU 3900 Architecture
Image from my post on setting up Huawei Base stations, showing the Huawei Baseband (BBU) connecting to the Huawei Radios (RRUs) via CPRI (in Yellow)

The Unexpected Plot Twist

“Nuts to this” the industry said after being stuck locked between the same radios and baseband for years; we should create a standard so we can mix and match between radio vendors, and even standardize some other stuff that’s been bothering us, so we’ll have a happy world of interoperability.

A world with interoperable fronthaul

With kit created that followed this standard, we’d be able to take components from vendor A, B & C, and fit them together like Lego, saving you some money along the way and giving you’ve got a working solution made of “best of breed” components, where everything is interoperable.

Omnitouch Lego base stations, which also fit together like Lego – Part of the Omnitouch Network Services “swag” from 2024

So the industry created a group to chart a path for a better tomorrow by standardizing these interfaces.

The group had many industry heavyweights like Nokia, NEC, LG, ZTE and Samsung joining.

The key benefits espoused on their website:

An open market will substantially reduce the development effort and costs that have been traditionally associated with creating new base station product ranges. The availability of off-the-shelf base station modules will enable manufacturers to focus their development efforts on creating further added value within the base station, encouraging greater innovation and more cost-effective products. Furthermore, as product development cycles will be reduced, new base station functions will become available on the market more quickly.

Mission statement of the group

In addition to being able to mix and match radios and basebands from different vendors, the group defined standards for centralized baseband, and interoperable standards, to allow a multi-vendor ecosystem to flourish.

And here’s the plot twist – The text above, was not written about OpenRAN, and it was not written about the benefits of eCPRI.

It was written about Open Base Station Architecture Initiative (OBSAI) and it was written 22 years ago.

*record screech sound*

This image was called "Confused Ernie" but it's clearly Bert...

Standards War you’ve never heard of: OBSAI vs CPRI

When OBSAI was defined it was not without competition; there was another competing fronthaul standard; that’s right, the mustache twirling lowlife from earlier in the story – CPRI.

Supported by Huawei, Nortel, NEC & Ericsson (among others), CPRI took a “gentle parenting” approach to the standards world, in contrast to OBSAI.
Instead of telling all the vendors to agree on an interoperable front haul standard, CPRI just encouraged everyone to implement what their heart told them and what felt right to them.

As it happened, the industry favored the CPRI approach.

If a vendor wanted to add a new “word” in their CPRI “language” to add a new feature, they just went ahead and added it – It didn’t require anyone else to agree with them or changes to a common standard used by the industry, vendors could talk to the kit they made how they wanted.

CPRI has been the defacto non-standard used by all the big kit vendors for the past ~10 years.

The Death of OBSAI & the Birth of OpenRAN’s eCPRI

Why haven’t you heard of OBSAI? Why didn’t the OBSAI standard just serve as the basis for eCPRI – After all the last OBSAI release was less than 5 years before TIP started working on eCPRI publicly.

Is no more. It has ceased to be.

Did a schism over “uplink performance improvement” options lead to “irreconcilable differences” between parties leading to the breakup of the OBSAI group?

Nope.

Customers (MNOs) didn’t buy OBSAI based equipment in measurably larger quantities than CPRI kit. That’s it.

This meant the vendors invested less in paying teams to further develop the standards, the OBSAI group met less frequently, and in the end, member vendors didn’t bother adding support for OBSAI to new equipment and just used the easier and more flexible CPRI option instead.

At some point someone just stopped paying for the domain renewal and that was it, OBSAI was no more.

This is how the standards body ends, not with a bang, but with a whimper.

T.S. Elliot’s writings on the death of obsai

Those who do not learn from history…

The goals of the OBSAI Group and OpenRAN working groups are almost identical, so what lessons did Marconi, Motorola and Alcatel learn as members of OBSAI that other vendors could learn about OpenRAN strategy?

There are no mentions of OBSAI in any of the information published by OpenRAN advocates, and I’m wondering if folks aren’t aware that history tends to repeat and are ignorant to what came before it, or they’re just not learning lessons from the past?

So what can the OpenRAN industry learn from OBSAI?

Being a nerd, I started detailing the technical challenges, but that’s all window dressing; The biggest hurdle facing CPRI vs eCPRI are the same challenges OBSAI vs CPRI faced a decade prior:

To be relevant, OpenRAN kit has to be demonstrably better than what we have today AND provide a tangible cost saving.

OBSAI failed at achieving this, and so failed to meet it’s other more noble goals.

[At the time of writing this at least] I’d contend that neither of those two criteria have been met by OpenRAN.

What does the future hold for OpenRAN?

Looking into the crystal ball, will OpenRAN and eCPRI go the way of OBSAI, or will someone keep the OpenRAN dream alive?

Today, we’re still seeing the MNOs continue to provide tokenistic investment in OpenRAN. But being a cynic, I’d say the MNOs are feigning interest in OpenRAN products because it’s advantageous for them to do so.

The threat of OpenRAN has proven to be a great stick to beat the traditional vendors with to force them to lower their prices.

Think about the $14 billion USD Ericsson deal with AT&T, if chucking a few million at OpenRAN pilots / trials lead to AT&T getting even a 0.1% reduction in what they’re paying Ericsson, then the numbers would have worked out well in AT&Ts favor.

From the MNOs perspective, the cost to throw the odd pilot or trial to a hungry OpenRAN vendor to keep them on the hook is negligible, but the threat of OpenRAN provides leverage and bargaining power every time it’s contract renewal time with the big RAN vendors.

Already we’ve seen all the traditional RAN vendors move to neutralize this threat by introducing “OpenRAN compatible” equipment and talking up their commitment to openness.

This move by the RAN vendors takes this sting out of the OpenRAN threat, and means MNOs won’t have much reason to continue supporting OpenRAN.

This leaves the remaining OpenRAN vendors like Miss Havisham, forever waiting in their proverbial wedding dresses, having being left at the altar.

Okay, I’m mixing my Dickens’ references here, but it was too good not to.

Appendix

I’ve been enjoying writing more analysis than just technical content, let me know if this is something you’re interested in seeing more of.

I’ve been involved in two big OpenRAN integration projects, both of which went poorly and probably tainted my perspective. Enough time has passed to probably write up how it all went with the vendor names removed, but that’s a post for another time!

If you wanted to learn more about OBSAI Archive.org has their old website available for reading.

Installing Calix CMS Java tool on Ubuntu in 2025

Ah, another post in my “how to make software work that was made with Java in the 1990s” post, except Calix last updated this software in 2022 – make of that what you will…

This time is Calix Management System (CMS), the Java app for managing equipment in exchanges / COs from Calix.

On Ubuntu 24.04 LTS it requires JRE version 8:

sudo apt install openjdk-8-jre
sudo apt install execstack

With that installed I could install CMS

/install.bin LAX_VM /usr/lib/jvm/java-8-openjdk-amd64/bin/java

Then it came time to run it, I chose to install in my home directory in a folder named “Calix” (default).

First you’ve got to make their startup script executable:

~/Calix$ chmod +x Start\ CMS

Then we need to modify it to point to the openjdk Java 8 binary, the simplest way is to just add the LAX_VM on startup:

~/Calix$ ./Start\ CMS LAX_VM /usr/lib/jvm/java-8-openjdk-amd64/bin/java

And you’re in.

TFTs & Create Bearer Requests

What is included in the Charging Rule on Gx ultimately turns into a Create Bearer Request on GTPv2-C.

But the mapping it’s always obvious, today I got stuck on the difference between a Single remote port type, and a single local port type, thinking that the Packet Filter Direction in the TFT controlled this – It doesn’t – It’s controlled by the order of your Traffic Flow Template rule.

Input TFT:

"permit out 17 from any 50000 to any"

Leads to Packet filter component type identifier: Single remote port type

Whereas a TFT of:

permit out 17 from any to any 50000

Leads to Packet filter component type identifier: Single local port type (64)

Holiday Reading list 2024

As summer reaches full swing in Australia and the level of effort I put into blog posts wains, here’s a lost of books I’m to-read or have read this year.

I can’t imagine a telecom book club being super popular, but if you’ve got any recommendations for good telecom related reads, I’d love to hear them!

The End of Telecoms History – William Webb (Read)

I read this this year, Webb is one of those folks who’s paycheck doesn’t come from shilling hardware, and he’s been pretty good at making accurate predictions and soothsaying, even when what he says upsets some.

The launch of 5G pretty much played out exactly how one of his other books (The 5G Myth) predicted, and the premise of The End of Telecoms History is that if we look at the data which suggest that bandwidth growth will not continue unabated forever, what does that mean?

I’ve a feeling there are a telecom execs quietly reading this book (while making sure that no sees them reading it) and planning for a potential future in a world of enough bandwidth to satisfy demand, and how this would impact their bottom lines and overall business model, even if outwardly everyone still claims the growth will continue forever.

The Iron Wire: A novel of the Adelaide to Darwin telegraph line – Garry Kilworth (Read)

A fun imagined romp about adventures in the bush while connecting a nation in the 18th century, the story is inspired by the real world events but are fictional, it’s a fun way to explore the topic and add bushrangers into the mix.

Rogers v. Rogers: The Battle for Control of Canada’s Telecom Empire – Alexandra Posadzki (Read)

Just finished this; I’ve worked with a lot of operators in the past, both big some small (the best ones are small), and it’s fascinating to understand at a board level how things get done in telecom giants, even if the Rogers’ family aren’t the best example of how to do this…

Chip War: The Fight for the World’s Most Critical Technology – Chris Miller (Read)

Without integrated circuits the telecom industry is back to relays and electromagnetically switching traffic (not that I’m against this).

Miller’s book outlines how we got to our current situation, and how the products coming out of TSMC and SMIC will shape the future of tech at a fundamental level.

How the World Was One – Arthur C Clarke (To Read)

Famed science fiction writer Arthur C Clarke had a penchant for scuba diving and communications (can relate) hence his interest in submarine telephony.

I read “Voice Across the Sea” a few years ago (on an actual paper based book no less!) but this is freely available as an eBook and I’m looking forward to reading it.

Introducing Elixir – Simon St. Laurent & J. David Eisenberg (Reading)

The dev team at Omnitouch are all about Elixir, and being an old dinosaur I figured I should at least learn the basics!

I’m still working my way through the book, having a folder of examples typed out from the book (I can’t learn through copy / paste!), enjoying it so far, even if I’m slower than I’d like.

Adventures in Innovation: Inside the Rise and Fall of Nortel – John Tyson

My first job was with Nortel, so I’ve got a bit of a soft spot of the former Canadian telecom behemoth, and never felt I’d had a satisfactory explanation as to where it all went wrong. I got this book expecting a bit more insight into the fall part, but this book gave an interesting account as to the design of things I’d never put much thought into before.

The Real Internet Architecture: Past, Present, and Future Evolution – Zave, Pamela;Rexford, Jennifer; (To Read)

This came from a recommendation on Twitter, I know almost nothing about it other than that, but I’m keen to dig into this.

Burn Book: A Tech Love Story – Kara Swisher (Read)

A fun insight into the life and times of the big tech.

The 6G Manifesto – William Webb

There’s a Simpsons’ scene where Lisa is buying an Al Gore book named “Sane Planning, Sensible Tomorrow” and says “I hope it’s as exciting as his other book, ‘Rational Thinking, Reasonable Future'”.

I can’t help but feel Webb’s books are kinda like this (in a good way).

Realism is so important; staying grounded in reality is critical. Operators who go chasing fairy tales of driving higher ARPUs with wacky ideas with no business case or demand from end customers (and generally pushed by vendors, rather than operators) will struggle to remain viable in the future if they pour all their cash into things that won’t see a return, so I’m looking forward to reading some sane ideas as to how to approach the unnecessary Gs.

Flash SMS Messages

Stumbled across these the other day, while messing around with some values on our SMSc.

Setting the Data Coding Scheme to 16 with GSM7 encoding flags the SMS as “Flash message”, which means it pops up on the screen of the phone on top of whatever the user is doing.

While reading a quality telecom blog bam! There’s the flash SMS popping over whatever I was reading.

Oddly while there’s plenty of info online about Flash SMS, it does not appear in the 3GPP specifications for SMS.

Turns out they still work, move over RCS and A2P, it’s all about Flash messages!

There’s no real secret to this other than to set the Data Coding Scheme to 16, which is GSM7 with Flash class set. That’s it.

If you’re interested in the internal machinations of how SMS works, I’ve got a few posts on the topicYou can find a list of them here.

Obviously to take advantage of this you’d need to be a network operator, or have access to the network you wish to deliver to. Recently more A2P providers are filtering non vanilla SMS traffic to filter out stuff like SMS OTA message or SIM specific messages, so there’s a good chance this may not work through A2P providers.

Background to the “VoLTE Mess”

I’ve been writing a fair bit recently about the “VoLTE Mess” – It’s something that’s been around for a long time, mostly impacting greenfield players rolling out LTE only, but now the big carriers are starting to feel it as they shut off their 2G and 3G networks, so I figured a brief history was in order to understand how we got here.

Note: I use the terms 4G or LTE interchangeably

The Introduction of LTE

LTE (4G) is more “spectrally efficient” than the technologies that came before it. In simple terms, 1 “chunk” of spectrum will get you more speed (capacity) on LTE than the same size chunk of spectrum would on 2G or 3G.

From my post on 5G being a bit overhyped

So imagine it’s 2008 and you’re the CTO of a mobile network operator.
Your network is congested thanks to carrying more data traffic than it was ever designed for (the first iPhone had launched the year before) and the network is struggling under the weight of all this new data traffic.
You have two options here, to build more cell sites for more density (very expensive) or buy more spectrum (extremely expensive) – Both options see you going cap in hand to the finance team and asking for eye-wateringly large amounts of capital for either option.

But then the answer to your prayers arrives in the form of 3GPP’s Release 8 specification with the introduction of LTE. Now by taking some 2G or 3G spectrum, and by using it on 4G, you can get ~5x more capacity from the same spectrum. So just by changing spectrum you own from 2G or 3G to 4G, you’ve got 5x more capacity. Hallelujah!

So you go to Nortel and buy a packet core, and Alcatel and Siemens provide 4G RAN (eNodeBs) which you selectively deploy on the cell sites that are the most congested.
The finance team and the board are happy and your marketing team runs amok with claims of 4G data speeds.
You’ve dodged the crisis, phew.

This is the path that all established mobile operators took; throw LTE at the congested cell sites, to cheaply and easily free up capacity, and as the natural hardware replacement cycle kicked in, or cell sites reached capacity, swap out the hardware to kit that supports LTE in addition to the 2G and 3G tech.

Circuit Switched Fallback

But it’s hard to talk about the machinations of late 2000s telecom executives, without at least mentioning Hitler.

This video below from 15 years ago is pretty obscure and fairly technical, but the crux of it it is that Hitler is livid because LTE does not have a “CS Domain” aka circuit switched voice (the way 2G and 3G had handled voice calls).

It was optional to include support for voice calls in the LTE network (Voice over LTE) when you launched LTE services. So if you already had a 2G or 3G network (CS Network) you could just keep using 2G and 3G for your voice calls, while getting that sweet capacity relief.

So our hypothetical CTO, strapped for cash and data capacity, just didn’t bother to support VoLTE when they launched LTE – Doing so would have taken more time to launch, during which time the capacity problem would become worse, so “don’t worry about VoLTE for now” was the mantra.

All the operators who still had 2G and 3G networks, opted to just “Fallback” to using the 2G / 3G network for calling. This is called “Circuit Switched Fallback” aka CSFB.

Operators loved this as they got the capacity relief provided by shifting to 4G/LTE (more capacity in the network is always good) and could all rant about how their network was the fastest and had 4G first, this however was what could be described as a “Foot gun” – Something you can shoot yourself in the foot with in the future.

Operators eventually introduce VoLTE

Time ticked on an operators built out their 4G networks, and many in the past 10 years or so have launched VoLTE in their own networks.

For phones that support it, in areas with blanket 4G coverage, they can use VoLTE for all their calls.

But that’s the sticking point right there – If the phones support it.

But if the phones don’t support it, they’re roaming or making emergency calls, there is always been the safety blanket of 2G or 3G and Circuit Switched fallback to well, fall back to.

There’s no driver for operators who plan to (or are required to) operate a 2G or 3G network for the foreseeable future, to ensure a high level of VoLTE support in their devices.

For an operator today with 2G or 3G, Voice over LTE is still optional.
Many operators still rely exclusively on Circuit Switched Fallback, and there are only a handful of countries that have turned off 2G and 3G and rely solely on VoLTE.

VoLTE Handset Support

For the past 16 years phone manufacturers have been making LTE capable phones.

But that does not mean they’ve been making phones that support Voice over LTE.

But it’s never been an issue up until this point, as there’s always been a circuit switched (2G/3G) network to fall back to, so the fact that these chips may not support VoLTE was not a big problem.

Many of the cheaper chipsets that power phones simply don’t support VoLTE – These chips do support LTE for data connections but rely on Circuit Switched Fallback for voice calls. This is in part due to the increased complexity, but also because some of the technologies for VoLTE (like AMR) required intellectual property deals to licence to use, so would add to the component cost to manufacture, and in the chips game, keeping down component cost is critical.

Even for chips that do support Voice over LTE, it’s “special”. Unlike calling in 2G or 3G that worked the same for every operator, phone manufacturers require a “Carrier Bundle” for each operator, containing that specific operators’ special flavor of VoLTE, that operator uses in their network.

This is because while VoLTE is standardized (Despite some claims to the contrary) a lot of “optional” bits have existed, and different operators built networks with subtle differences in the “flavor” of their Voice over LTE (IMS) stack they used. The OEMs (Phone / Chip manufacturers) had to handle these changes in the devices they made, for in order to sell their phones through that operator.

This means I can have a phone from vendor X that works with VoLTE on Network Y, but does not support VoLTE on Network Z.

Worse still, knowing which phones are supported is a bit of a guessing game.

Most operators sell phones directly to their customer base, so buying an Network Y branded phone from Vendor X, you know it’s going to support Network Y’s VoLTE settings, but if you change carriers, who knows if it’ll still support it?

When you’ve still got a Circuit Switched network it’s not the end of the world, you’ll just use CSFB and probably not realize it, until operators go to shut down 2G / 3G networks…

IMS Profile selection on an engineering mode MTK based Android handset

Navigating the Maze of VoLTE Compatibility

Here are some simple checklist you can ask your elderly family members if they ask if their phone is VoLTE compatible:

  • Does the underlying chipset the phone is based on support VoLTE? (you can find this out by disassembling the phone and checking the datasheets for the components from the OEMs after signing NDAs for each)
  • Does the underlying chipset require a “carrier bundle” of settings to have been loaded for this operator in order to support VoLTE (See Qualcomm MBM as an example)?
  • What version of this list am I currently on (generally set in the factory) and does it support this operator? (You can check by decapping the ICs and dumping their NVRAM and then running it through a decompiler)
  • Does my phones OS (Android / iOS) require a “carrier bundle” of it’s own to enable VoLTE? Is my operator in the version of the database on the phone? (See Android’s Carrier Database for example) (You can find the answer by rooting the phone and running some privileged commands to poke around the internal file system)
  • Does my operator / MNO support VoLTE – Does my plan / package support VoLTE? (You can easily find the answer by visiting the store and asking questions that don’t appear on the script)

If you managed to answer yes to all of the above, congratulations! You have conditional VoLTE support on your phone, although you probably don’t have a working phone anymore.

Wait, conditional VoLTE support?

That’s right folks, VoLTE will work in some scenarios with your operator!

If you plan on traveling, well your phone may support VoLTE at home, but does the phone have VoLTE roaming enabled?
Many phones support VoLTE in the home network, but resort to CSFB when roaming.

If it does support VoLTE roaming, does the network you’re visiting support VoLTE roaming? Has the roaming agreement (IRA) between the operator you’re using while traveling and your home operator been updated to include VoLTE Roaming? These IRAs (AA.12 / AA.13 docs) also indicate if the network must turn off IPsec encryption for the VoLTE traffic when roaming, which is controlled by the phone anyway.

Phew, all this talk of VoLTE roaming while traveling scares me, I think I’ll stay home in the safety of the Australian bush with all these great friendly animals around a phone that supports VoLTE on my home network.

Ah – After spending some time in the Australian bush one of our many deadly animals bit me. Time to call for help! Wait, what about emergency calls over VoLTE? Again, many phones support VoLTE for normal calls, fall back to 2G or 3G for the emergency call, so if you have one of those phones (You’ll only find out if you try to make an emergency call and it fails) and try to make an emergency call in a country without 2G or 3G, you’d better find a payphone.

There’s many real world examples of this, our friends at OptimERA have been lobbying the FCC since 2019 on this.

Sarcasm aside, there’s no dataset or compatibility matrix here – No simple way to see if your phone will work for VoLTE on a given operator, even if the underlying chip does support VoLTE.

Operators in Australia which recently shut down their 3G network, were mandated to block devices that didn’t support VoLTE for emergency calling. They did this using an Equipment Identity Register, and blocking devices based on the Type Allocation Code, but this scattergun approach just blocked non-carrier issued devices, regardless of it they supported VoLTE or VoLTE emergency calling.

Blame Game

So who’s to blame here?

There’s no one group to blame here, the industry has created a shitty cycle here:

  • Standards orgs for having too many “flavors” available
  • Operators deploying their own “Flavors” of VoLTE then mandating OEMs / Chip manufacturers comply with their “flavor”.
  • OEMs / Chip manufactures respond by adding “Carrier Bundles” to account for this per-operator customization

I’ve got some ideas on a way to unscramble this egg, and it’s going to take a push from the industry.

If you’re in the industry and keen to push for a fix, get in touch!

It’s time to get a long term solution to this problem, and we as an industry need to lead the change.

Tales from the Trenches – IMS TCP Socket Handling

Oh boy this has been a pain in the backside with IMS / VoLTE devices using TCP and how they handle the underlying TCP sockets.

A mobile phone from manufacturer A, wants every SIP dialog to be in it’s own TCP session, while a phone from manufacturer B wants a unique TCP session per transaction, while manufacturer C thinks that every SIP message should reuse the same transaction.

So an MT call to manufacturer A, who wants every SIP dialog in it’s own transaction would look something like this:

PCSCF:44738 -> UE:5060; TCP SYN
UE:5060 -> PCSCF:44738; TCP SYN/ACK
PCSCF:44738 -> UE:5060; TCP ACK
--- TCP connection is now open to UE from P-CSCF---
--- Start of new SIP Transaction 1 & Dialog ---
PCSCF:44738 -> UE:5060; TCP PSH - SIP INVITE....
UE:5060 -> PCSCF:44738; TCP ACK


UE:5060 -> PCSCF:44738; TCP PSH - SIP 200....
PCSCF:44738 -> UE:5060; TCP ACK, PSH - SIP ACK....
UE:5060 -> PCSCF:44738; TCP ACK
--- End of SIP Transaction 1 ---

--- Start of SIP Transaction 2 ---
PCSCF:44738 -> UE:5060; TCP PSH - SIP BYE....
UE:5060 -> PCSCF:44738; TCP ACK, PSH - SIP 200....
--- End of SIP Transaction 2 & SIP Dialog ---
PCSCF:44738 -> UE:5060; TCP FIN
UE:5060 -> PCSCF:44738; TCP ACK
--- End of TCP Connection ---

Where UE:5060 – is the IP & port of the UE, as advertised in the Contact: header, while PCSCF:44738 is the PCSCF IP and a random TCP port used for this connection.

But for manufacturer B, who wants a unique TCP session per transaction, they want it to look like this:

PCSCF:44738 -> UE:5060; TCP SYN
UE:5060 -> PCSCF:44738; TCP SYN/ACK
PCSCF:44738 -> UE:5060; TCP ACK
--- TCP connection is now open to UE from P-CSCF---
--- Start of new SIP Transaction 1 & Dialog ---
PCSCF:44738 -> UE:5060; TCP PSH - SIP INVITE....
UE:5060 -> PCSCF:44738; TCP ACK


UE:5060 -> PCSCF:44738; TCP PSH - SIP 200....
PCSCF:44738 -> UE:5060; TCP ACK, PSH - SIP ACK....
UE:5060 -> PCSCF:44738; TCP ACK
PCSCF:44738 -> UE:5060; TCP FIN
UE:5060 -> PCSCF:44738; TCP ACK
--- End of SIP Transaction 1 & TCP Session 1 ---

--- Start of TCP Session 2 ----
PCSCF:32627 -> UE:5060; TCP SYN
UE:5060 -> PCSCF:32627; TCP SYN/ACK
PCSCF:32627 -> UE:5060; TCP ACK
--- Start of SIP Transaction 2 ---
PCSCF:32627 -> UE:5060; TCP PSH - SIP BYE....
UE:5060 -> PCSCF:32627; TCP ACK, PSH - SIP 200....
--- End of SIP Transaction 2 & SIP Dialog ---
PCSCF:32627 -> UE:5060; TCP FIN
UE:5060 -> PCSCF:32627; TCP ACK
--- End of TCP Connection 2 ---

And then manufacturer C wants just the one TCP session to be used for everything, so they open the TCP connection when they register, and that’s all we use for everything.

Is there any logic to this? Nope, seems to be tied to the underlying chipset (Qualcomm vs Mediatek vs Unisoc) and the SIP stack used (Qualcomm, MTK, Unisoc, Samsung, Apple).

We’ve profiled devices into one of 3 behaviors, and then we tag them based on user agent as to what “persona” they demand from the network.

I can’t believe I’m still talking about VoLTE / IMS handset support and it’s almost 2025…. For context IMS was “standardized” 17 years ago.

Mobile Network Code – 2 or 3 Digits?

Every mobile network broadcasts a Public Mobile Network Code – aka a PLMN. This 6 octet value is used to identify the network (Although this gets murky with shared codes and Private networks and when OEMs make them for codes they don’t own).

It’s made up of a Mobile Country Code followed by a Mobile Network code.

One of the guys at work asked a seemingly simple question, is the PLMN with MCC 505 and MNC 57 the same as MCC 505 MNC 057 – It’s on 6 octets after all.

So is Mobile Network Code 57 the same as Mobile Network Code 057 in the PLMN code?

The answer is no, and it’s a massive pain in the butt.

All countries use 3 digit Mobile Country Codes, so Australia, is 505. That part is easy.

The tricky part is that some countries (Like Australia) use 2 digit Mobile Network Codes, while others (Like the US) use 3 digit mobile network codes.

This means our 6 digit PLMN has to get padded when encoding a 2 digit Mobile Network Code, which is a pain, but also that by looking at the IMSI alone, you don’t know if the PLMN is 2 digit or 3 digit – IMSI 5055710000001 could be parsed as MCC 505, MNC 571 or MCC 505, MNC 57.

Why would you do this? Why would a regulator opt to have 1/10th the addressable size of network codes – I don’t know, and I haven’t been able to find an answer – If you know please drop a comment, I’d love to know.

So how do we handle this?

There are files in the SIM profile to indicate the length of the MNC, the Administrative Domain EF on the SIM allow us to indicate if the MNC is 2 digit or 3 digit, and the HPLMNwAct and the other *PLMN* EFs encode the PLMN as 6 digit, with or without padding, to allow differentiation.

That’s all well and good from a SIM perspective, but less useful for scenarios where you might be the Visited PLMN for example, and only see the IMSI of a Subscriber.

We worked on a project in a country that mixed both 2 digit and 3 digit Mobile Network Codes, under the same Mobile Country Code. Certain Qualcomm phones would do very very strange things, and it took us a long time and a lot of SIM OTA to resolve the issue, but that’s a story for another day…

private_data_dir on Ansible Runners called from Python

We’ve got a web based front end in our CRM which triggers Ansible Playbooks for provisioning of customer services, this works really well, except I’d been facing a totally baffling issue of late.

Ansible Plays (Provisioning jobs) would have variables set that they inherited from other Ansible Plays, essentially if I set a variable in Play X and then ran Play Y, the variable would still be set.

I thought this was an issue with our database caching the play data showing the results from a previous play, that wasn’t the case.

Then I thought our API that triggers this might be passing extra variables in that it had cached, wasn’t the case.

In the end I ran the Ansible function call in it’s simplest possible form, with no API, no database, nothing but plain vanilla Ansible called from Python

    # Run the actual playbook
    r = ansible_runner.run(
        private_data_dir='/tmp/',
        playbook=playbook_path,
        extravars=extra_vars,
        inventory=inventory_path,
        event_handler=event_handler_obj.event_handler,
        quiet=False
    )

And I still I had the same issue of variables being inherited.

So what was the issue? Well the title gives it away, the private_data_dir parameter creates a folder in that directory, called env/extravars which a JSON file lives with all the vars from all the provisioning runs.

Removing the parameter from the Python function call resolved my issue, but did not give me back the half a day I spent debugging this…

All about Global Title Translation & SCCP Routing

This is the next post in my series on SS7, and today we’re taking a look at SCCP the Signalling Connection Control Part (SCCP).

High Level

Global Title uses the routing features from SCCP, which is another layer on top of MTP3.

SCCP allows us to route on more than just point code, instead we can route based on two new fields, Subsystem Number and Global Title.

Subsystem Number is the type of system we are looking to reach, ie an HLR, MSC, CAMEL Gateway, etc.

The Global Title generally looks like an E.164 formatted phone number, and often it is just that.

Somewhere along the chain (typically at the end of it) an STP somewhere needs to perform Global Title Translation to analyse the SCCP header (Subsystem Number, Point Code & Global Title) and finally turn that into a single point code to route the MTP3 message to.

The advantage of this is we are no longer just limited to routing messages based on Point Code.

This is how the international SS7 Network used for roaming is structured and addressed – All using Global Title rather than Point Codes.

The need for SCCP

For starters, after all this talk of MTP3 and Point Codes, why the need to add SCCP?

Let’s go back in time and look at the motivators…

1. Address space is finite

Point codes are great, and when we’ve spoken about them before, I’ve compared them to IPv4 address, but rather than ranging from 0.0.0.0 to 255.255.255.255 (32 bits on IPv4) international signaling point codes range from 0.0.0 to 7.255.7 (14 bits).

The problem with IPv4’s 32 bit addresses is they run out. The problem with the ITU International Signaling Point Codes is that they too, are a limited resource with only 16,383 possible ISPCs.

~700 operators worldwide each with ~100 network elements would be 70k point codes to address them all – That’s not going to fit into our 16k possible Point Codes.

Global Title fixes this, because we’re able to use E.164 phone number ranges (which are plentiful) for addressing, we’re still not at IPv6 levels of address space, but pretty hefty.

2. Service Discovery by Subsystem

Now imagine you’re a VLR looking to find an HLR. The VLR and the HLR are both connected to an STP, but how does the VLR know where to reach the HLR?

One option would be to statically set every route for the Point Code of every HLR into every possible VLR and visa-versa, but that gets messy fast.

What if the VLR could just send a request to the STP and indicate that the request needs to be routed to any HLR, and the STP takes care of finding a SS7 node capable of handling the request, much a Diameter Routing Agent routes based on Application ID.

SCCP’s “Subsystem Number” routing can handle this as we can route based on SSN.

3. Service Discovery by MSISDN

Having an SMS destined to a given MSISDN requires the SMSc to know where to route it.

Likewise an MSC wanting to call a given number.

There’s a lot of MSISDN ranges. Like a lot. Like every phone mobile number.

Having every a table on every SSP/SCP in the network know where every MSISDN range is in the world and what point code to go through to reach it is not practical.

Instead, being able to have the SCP/SSPs (like our MSC or SMSc) send all off-net traffic to an STP frees us the individual SCP/SSPs from this role; they just forward it to their connected STP.

Our STP can analyse the destination MSISDN and make these routing decisions for us, using Global Title Translation based on rules in the Global Title Table on the STP.

For example by adding each of the domestic / national MSISDN ranges/prefixes into the Global Title Table on the STP (along with the corresponding point code to route each one to), the STP can look at the destination MSISDN in the message and forward to the STP for the correct operator.

Likewise a route can match anything where the Global Title address is outside of the local country and send it to an international signaling provider.

Global title takes care of this as we can route based on a phone number.

4. Tokenistic Security

By “Hiding” network elements behind Global Titles, you don’t expose as much information about your internal network, and the only way people can “find” your network elements would be scanning through all the possible addresses in your (publicly advertised) Global Title range (wardialing is back baby!).

But the phrases “Security” and “SS7” don’t really belong together…

The SCCP Header

The SCCP header has a Called Party and a Calling Party, and this is where the magic happens.

These can be made up for any number of 3 parts:

  • Global Title Address
  • Subsystem Number
  • Point Code

We can route on any combination of these.

To indicate we’re using SCCP, we set the Signaling Indicator bit in the M3UA / MTP3 message to SCCP:

Great, now we can look at our SCCP header.

It looks like there’s a lot going on, but we can see the calling and called party (888888888 is called by 9999999999) with the Subsystem number set (888888888 is called for subsystem HLR, from 999999999 which is a VLR).

The closest TCP/IP analogy I can think of here is that of port numbers, there’s still an IP (Point code) but the port number allows us to specify multiple applications that run at a higher layer. This analogy falls down when we consider that the Point Code is generally set to that of your STP, not the final STP.

For this to work, we’ve got to have at least one Signaling Transfer Point in the flow, where we send the request to.

Somewhere (generally at the end of the chain of STPs), an STP is going to perform Global Title Translation.

What does this look like? Well let’s have a look at my GT table for the example above, in my lab network, I’ve got two nodes attached (via M3UA but could equally be on MTP3 links), my test MAP client where I’m originating this traffic, and an SMS Firewall, I can see they’re both up here:

Now knowing this I need to setup my SCCP routing for Global Title. In the screenshot above, the Called Party was 888888888 with Subsystem Number 7. Inside the SCCP request, there’s a few other fields, the Translation Type we have set to 0, Global Title Indicator is 4 (route on Global Title), while Numbering Plan Indicator is 1 (ISDN) and Nature of Address Indicator is 4 (International).

So on my Cisco ITP I define a GTT Selector to target traffic with these values, Translation Type is 0, Global Title Indicator is 4, Number Plan is 1 and the Nature of Address Indicator is 4.

So we’d define a Global Title Translation selector like the one below to match this traffic:

cs7 instance 0 gtt selector GLOBAL_tt0 tt 0 gti 4 np 1 nai 4

But that’s only matching the group of traffic, it’s not going to match based on the actual SCCP Called Party. So now I need to define a translation for each Global Title address (Called /Calling party) or prefix I want to route, I’ve setup anything starting with 888 to route to the `SMSFirewall` ASP endpoint.

cs7 instance 0 gtt selector GLOBAL_tt0 tt 0 gti 4 np 1 nai 4
gta 888 asname SMSFirewall gt ssn 6

I could stop here and my request addressed to 888888888 would make it to the SMSFirewall ASP, but the response never would, like in all SS7 routing, we need to define the return route translation too, which is what I’ve done for 999999 to route to the TestClient.

Lastly I’ve added a wildcard route, this means if this STP doesn’t know how to resolve a GT address matching the rules in the top line, it’ll forward the request to the STP at point code 1.2.3 – This is how you’d do your connection to an IPX / Signaling exchange.

Debugging this can be a massive pain in the backside, but if you enable logging you can see when GT rules are not matched, like in the example below.

If your network is quiet enough, it’s sometimes easier to just make your rules based on what you observe failing to route.

So with those routes in place, when we send a request with the Global Title called party starting with 8888888 it’s routed to M3UA ASP SMSFirewall, which handles the request, and then sends the response back to the MAPClient M3UA ASP.

Tales from the Trenches – SMS Data Coding Scheme 0

The Data Coding Scheme (DCS or TP-DCS) header in an SMS body indicates what encoding is used in that message.

It means if we’re using UCS-2 (UTF16) special characters like Emojis etc, in in our message, the phone knows to decode the data in the message body using UTF, because the Data Coding Scheme (DCS) header indicates the contents are encoded in UTF.

Likewise, if we’re not using any fancy characters in our message and the message is encoded as plain old GSM7, we set set the DCS to 0 to indicate this is using GSM7.

From my experience, I’d always assumed that DCS0 (Default) == GSM7, but today I learned, that’s not always the case. Some SMSc entities treat DCS0 as Latin.

Let me explain why this is stupid and why I wasted a lot of time on this.

We can indicate that a message is encoded as Latin by setting the DCS to 0x03:

We cannot indicate that the message is encoded as GSM7 through anything other than the default alphabet (DCS 0).

Latin has it’s own encoding flag, if I wanted the message treated as Latin, I’d indicate the message encoding is Latin in the DCS bit!

I spent a bunch of time trying to work out why a customer was having issues getting messages to subscribers on another operator, and it turned out the other operator treats messages we send to them on SMPP with DCS0 as Latin encoding, and then cracks the sads when trying to deliver it.

The above diff shows the message we send (Right), and the message they dry to deliver (left).

Well, lesson learned…

MAP – SendRouting Information for SM – locationInfoWithLMSIMAP

The other day I was facing an issue with our SMSc inter-working with another operator via MAP.

Our SRI-for-SM responses were relayed back to their nodes, but it was like it couldn’t parse the message.

I got some “known good” traffic to compare this against to work out what we’re doing wrong.

The difference in the two examples below is subtle, but it’s there – On the example on the left (failing) we are including an msc-number in the locationInfoWithLMSI field, while on the right we’ve got a “Network Node Number”.

“Okay” I thought to myself, we’re just doing something wrong with the encoding of the MAP body, so I did my usual diff trick from Wireshark:

Oddly in the raw form both these values decode the same, if I feed the values on the left into our decoder, and then encode, I get the values on the right, with the exact same hex body – and this is all ASN.1; so there’s very little room for error anyway.

So what gives? Why does Wireshark show one MAP body differently to the other, with the same hex bytes?

Well, the issue is not within my GSM MAP body and the content I include there, but rather in the TCAP layer above it, specifically the `application-context-name`.

We’re indicating support for GSM MAP v3 (0.4.0.0.1.0.20.3), while the other operator is using MAPv2 ( 0.4.0.0.1.0.20.2 ) even though their IR.21 indicates it should be GSM MAP v3.

Now our SRI-for-SM responses are based on the GSM MAP version received, rather than reported as supported, and we don’t have to deal with this for handling requests anymore.

Module debug in Kamailio

Sometimes Kamailio modules don’t behave how you expect them to, and you want to dive a little deeper into what’s going on.

If we load the debugger.so module in our kamailio.cfg file, we can get a bit more of an insight.

For example, if we wanted to deep dive into what RTPengine was thinking we’d include:

loadmodule "debugger"
modparam("debugger", "mod_level_mode", 1)
modparam("debugger", "mod_level", "rtpengine=3")

And bam, now when we run Kamailio we’ll get all the logic that rtpengine exposes when being called.

By setting cfgtrace we can enable a config trace as well, and as we step through our Kamailio config file for a SIP message, we can see what’s going on at every step.

So if you’re like me, you can use the debugger module, combined with reading through the source code, to prove every single time that the issue is with my not understanding the function properly, and never the logic of the Kamailio module!

Using fancy test kit to measure”Cell Antenna” stickers

Technology is constantly evolving, new research papers are published every day.

But recently I was shocked to discover I’d missed a critical development in communications, that upended Shannon’s “A mathematical theory of communication”.

I’m talking of course, about the GENERATION X PLUS SP-11 PRO CELL ANTENNA.

I’ve been doing telecom work for a long time, while I mostly write here about Core & IMS, I am a licenced rigger, I’ve bolted a few things to towers and built my fair share of mobile coverage over the years, which is why I found this development so astounding.

With this, existing antennas can be extended, mobile phone antennas, walkie talkies and cordless phones can all benefit from the improvement of this small adhesive sticker, which is “Like having a four foot antenna on your phone”.

So for the bargain price of $32.95 (Or $2 on AliExpress) I secured myself this amazing technology and couldn’t wait to quantify it’s performance.

Think of the applications – We could put these stickers on 6 ft panel antennas and they’d become 10ft panels. This would have a huge effect on new site builds, minimize wind loading, less need for tower strengthening, more room for collocation on the towers due to smaller equipment footprint.

Luckily I have access to some fancy test equipment to really understand exactly how revolutionary this is.

The packaging says it’s like having a 4 foot antenna on your phone, let’s do some very simple calculations, let’s assume the antenna in the phone is currently 10cm, and that with this it will improve to be 121cm (four feet).

According to some basic projections we should see ~21dB gain by adding the sticker, that’s a 146x increase in performance!

Man am I excited to see this in action.

Fortunately I have access to some fun cellular test equipment, including the Viavi CellAdvisor and an environmentally controlled lab my kitchen bench.

I put up a 1800Mhz (band 3) LTE carrier in my office in the other room as a reference and placed the test equipment into the test jig (between the sink and the kettle).

We then took baseline readings from the omni shown in the pictures, to get a reading on the power levels before adding the sticker.

We are reading exactly -80dBm without the sticker in place, so we expertly put some masking tape on the omni (so we could peel it off) and applied the sticker antenna to the tape on the omni antenna.

At -80dBm before, by adding the 21dB of gain, we should be put just under -60dBm, these Viavi units are solid, but I was fearful of potentially overloading the receive end from the gain, after a long discussion we agreed at these levels it was unlikely to blow the unit, so no in-line attenuation was used.

Okay, </sarcasm> I was genuinely a little surprised by what we found; there was some gain, as shown in the screenshot below.

Marker 1 was our reference without the sticker, while reference 2 was our marker with the sticker, that’s a 1.12dB gain with the sticker in place. In linear terms that’s a ~30% increase in signal strength.

So does this magic sticker work? Well, kinda, in as much that holding onto the Omni changes the characteristics, as would wrapping a few turns of wire around it, putting it in the kettle or wrapping it in aluminum foil. Anything you do to an antenna to change it is going to cause minor changes in characteristic behavior, and generally if you’re getting better at one frequency, you get worse at another, so the small gain on band 3 may also lead to a small loss on band 1, or something similar.

So what to make of all this? Maybe this difference is an artifact from moving the unit to make a cup of tea, the tape we applied or just a jump in the LTE carrier, or maybe the performance of this sticker is amazing after all…

Kamailio – NDB_Redis – undefined symbol: redisvCommand

I ran into this issue the other day while compiling Kamailio from source:

Jul 03 23:49:35 kamailio[305199]: ERROR: <core> [core/sr_module.c:608]: ksr_load_module(): could not open module </usr/local/lib64/kamailio/modules/ndb_redis.so>: /usr/local/lib64/kamailio/modules/ndb_redis.so: undefined symbol: redisvCommand
Jul 03 23:49:35 kamailio[305199]: CRITICAL: <core> [core/cfg.y:4024]: yyerror_at(): parse error in config file /etc/kamailio/kamailio.cfg, line 487, column 12-22: failed to load module

So what was going on?

Kamailio’s NDB Redis module relies on libhiredis which was installed, but the path it installs to wasn’t in the LD_LIBRARY_PATH environment variable.

So I added /usr/lib/x86_64-linux-gnu to LD_LIBRARY_PATH, recompiled and presto, the error is gone!

export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH

NBN “Micronode”

We’ve covered most of the types of NBN Node, but the “Micronode” isn’t something we’ve looked at. Micronodes are “satellites” that hang off a larger Nokia ISAM and act as NBN FTTN DSLAMs.

In NBN documentation these are referred to as “Compact Sealed DSLAMs” or CSDs, and in Nokia/Alcatel documentation these are called Sealed Expansion Modules (SEMs) – Regardless of what you call them, they act as 48 port VDSL capable DSLAMs like rest of the Nokia / Alcatel ISAM family that are used for FTTN.

They are similar in concept to the NBN FTTC program, except these units are not reverse powered, and they have a higher subscriber count.

They’re housed in the green cabinets that were used for the HFC / Pay TV networks installed in some locations underground.

The CSD are a Nokia 7367 ISAM SX-48V (White box with the heat sink on the left of the photo below) which is connected directly back to the nearest exchange.

There is also an internal PSU with batteries powered from the incoming unmetered AC supply and a PSU is supplied by Alpha technologies in the photo on the right (With the spider).

On the flip side of the unit is where the copper pairs come in, and the DSLAM X-Pairs go out, and a small fibre splice tray.

Here’s some links:

https://convergedigest.com/alcatel-lucent-intros-fixed-access

https://slideplayer.com/slide/10505530/

Seeing the Airwaves – QCsuper

Recently we were on a project and our RAN guy was seeing UEs hand between one layer and another over and over. The hysteresis and handover parameters seemed correct, but we needed a way to see what was going on, what the eNB was actually advertising and what the UE was sending back.

In a past life I had access to expensive complicated dedicated tooling that could view this information transmitted by the eNB, but now, all I need is a cellphone or a modem with a Qualcomm chip.

QCsuper is a super handy tool that gives access to the Diag interface on Qualcomm chipsets. That’s the same interface QXDM uses, but without the massive headache and usability issues that come with QXDM, plus it puts everything in Wireshark to make it super easy to view everything.

You can see RRC, NAS and even the user plane traffic, all from Wireshark.

I’ve tested this with a rooted Xiaomi Pro 12 and a Quectel modem in a M.2 to USB adapter.

This is really cool and I’m looking forward to using it more in the field, or just if I’m bored on the train scanning the airwaves!

Converting OP Key to OPc

I wrote recently about the difference between OP and OPc keys in SIM cards, and why you should avoid using OP keys for best-practice.

PyHSS only supports storing OPc keys for this reason, but plenty of folks still use OP, so how can you convert OP keys to OPc keys?

Well, inside PyHSS we have a handy utility for this, CryptoTool.py

https://github.com/nickvsnetworking/pyhss/blob/master/lib/CryptoTool.py

Usage is really simple, just plug in the Ki and OP Key and it’ll generate you a full set of vectors, including the OPc key.

nick@amanaki:~/Documents/pyhss/lib$ python3 CryptoTool.py --k 11111111111111111111111111111111 --op 22222222222222222222222222222222

Namespace(k='11111111111111111111111111111111', op='22222222222222222222222222222222', opc=None)
Generating OPc key from OP & K
Generating Multimedia Authentication Vector
Input K: b'11111111111111111111111111111111'
Input OPc: b'2f3466bd1bea1ac9a8e1ab05f6f43245'
Input AMF: b'\x80\x00'

Of course, being open source, you can grab the functions out of this and make a little script to convert everything in a CSV or whatever format your key data is in.

So what about OPc to OP?
Well, this is a one-way transaction, we can’t get the OP Key from an OPc & Ki.

5G / LTE Milenage Security Exploit – Dumping the Vectors

I’ve written about Milenage and SIM based security in the past on this blog, and the component that prevents replay attacks in cellular network authentication is the Sequence Number (Aka SQN) stored on the SIM.

Think of the SQN as an incrementing odometer of authentication vectors. Odometers can go forward, but never backwards. So if a challenge comes in with an SQN behind the odometer (a lower number), it’s no good.

Why the SQN is important for Milenage Security

Every time the SIM authenticates it ticks up the SQN value, and when authenticating it checks the challenge from the network doesn’t have an SQN that’s behind (lower than) the SQN on the SIM.

Let’s take a practical example of this:

The HSS in the network has SQN for the SIM as 8232, and generates an authentication challenge vector for the SIM which includes the SQN of 8232.
The SIM receives this challenge, and makes sure that the SQN in the SIM, is equal to or less than 8232.
If the authentication passes, the new SQN stored in the SIM is equal to 8232 + 1, as that’s the next valid SQN we’d be expecting, and the HSS incriments the counters it has in the same way.

By constantly increasing the SQN and not allowing it to go backwards, means that even if we pre-generated a valid authentication vector for the SIM, it’d only be valid for as long as the SQN hasn’t been authenticated on the SIM by another authentication request.

Imagine for example that I get sneaky access to an operator’s HSS/AuC, I could get it to generate a stack of authentication challenges that I could use for my nefarious moustache-twirling purposes whenever I wanted.

This attack would work, but this all comes crumbling down if the SIM was to attach to the real network after I’ve generated my stack of authentication challenges.

If the SQN on the SIM passes where it was when the vectors were generated, those vectors would become unusable.

It’s worth pointing out, that it’s not just evil purposes that lead your SQN to get out of Sync; this happens when you’ve got subscriber data split across multiple HSSes for example, and there’s a mechanism to securely catch the HSS’s SQN counter up with the SQN counter in the SIM, without exposing any secrets, but it just ticks the HSS’s SQN up – It never rolls back the SQN in the SIM.

The Flaw – Draining the Pool

The Authentication Information Request is used by a cellular network to authenticate a subscriber, and the Authentication Information Answer is sent back by the HSS containing the challenges (vectors).

When we send this request, we can specify how many authentication challenges (vectors) we want the HSS to generate for us, so how many vectors can you generate?

TS 129 272 says the Number-of-Requested-Vectors AVP is an Unsigned32, which gives us a possible pool of 4,294,967,295 combinations. This means it would be legal / valid to send an Authentication Information Request asking for 4.2 billion vectors.

Source TS 129 272

It’s worth noting that that won’t give us the whole pool.

Sequence numbers (SQN) shall have a length of 48 bits.

TS 133 102

While the SQN in the SIM is 48 bits, that gives us a maximum number of values before we “tick over” the odometer of 281,474,976,710,656.

If we were to send 65,536 Authentication-Information-Requests asking for 4,294,967,295 a piece, we’d have got enough vectors to serve the sub for life.

Except the standard allows for an unlimited number of vectors to be requested, this would allow us to “drain the pool” from an HSS to allow every combination of SQN to be captured, to provide a high degree of certainty that the SQN provided to a SIM is far enough ahead of the current SQN that the SIM does not reject the challenges.

Can we do this?

Our lab has access to HSSes from several major vendors of HSS.

Out of the gate, the Oracle HSS does not allow more than 32 vectors to be requested at the same time, so props to them, but the same is not true of the others, all from major HSS vendors (I won’t name them publicly here).

For the other 3 HSSes we tried from big vendors, all eventually timed out when asking for 4.2 billion vectors (don’t know why that would be *shrug*) from these HSSes, it didn’t get rejected.

This is a lab so monitoring isn’t great but I did see a CPU spike on at least one of the HSSes which suggests maybe it was actually trying to generate this.

Of course, we’ve got PyHSS, the greatest open source HSS out there, and how did this handle the request?

Well, being standards compliant, it did what it was asked – I tested with 1024 vectors I’ll admit, on my little laptop it did take a while. But lo, it worked, spewing forth 1024 vectors to use.

So with that working, I tried with 4,294,967,295…

And I waited. And waited.

And after pegging my CPU for a good while, I had to get back to real life work, and killed the request on the HSS.

In part there’s the fact that PyHSS writes back to a database for each time the SQN is incremented, which is costly in terms of resources, but also that generating Milenage vectors in LTE is doing some pretty heavy cryptographic lifting.

The Risk

Dumping a complete set of vectors with every possible SQN would allow an attacker to spoof base stations, and the subscriber would attach without issue.

Historically this has been very difficult to do for LTE, due to the mutual network authentication, however this would be bypassed in this scenario.

The UE would try for a resync if the SQN is too far forward, which mitigates this somewhat.

Cryptographically, I don’t know enough about the Milenage auth to know if a complete set of possible vectors would widen the attack surface to try and learn something about the keys.

Mitigations / Protections

So how can operators protect ourselves against this kind of attack?

Different commercial HSS vendors handle this differently, Oracle limits this to 32 vectors, and that’s what I’ve updated PyHSS to do, but another big HSS vendor (who I won’t publicly shame) accepts the full 4294967295 vectors, and it crashes that thread, or at least times it out after a period.

If you’ve got a decent Diameter Routing Agent in place you can set your DRA to check to see if someone is using this exploit against your network, and to rewrite the number of requested vectors to a lower number, alert you, or drop the request entirely.

Having common OP keys is dumb, and I advocate to all our operator customers to use OP keys that are unique to each SIM, and use the OPc key derived anyway. This means if one SIM spilled it’s keys, the blast doesn’t extend beyond that card.

In the long term, it’d be good to see 3GPP limit the practical size of the Number-of-Requested-Vectors AVP.

2G/3G Impact

Full disclosure – I don’t really work with 2G/3G stacks much these days, and have not tested this.

MAP is generally pretty bandwidth constrained, and to transfer 280 billion vectors might raise some eyebrows, burn out some STPs and take a long time…

But our “Send Authentication Info” message functions much the same as the Authentication Information Request in Diameter, 3GPP TS 29.002 shows we can set the number of vectors we want:

5GC Vulnerability

This only impacts LTE and 5G NSA subscribers.

TS 29.509 outlines the schema for the Nausf reference point, used for requesting vectors, and there is no option to request multiple vectors.

Summary

If you’ve got baddies with access to your HSS / HLR, you’ve got some problems.

But, with enough time, your pool could get drained for one subscriber at a time.

This isn’t going to get the master OP Key or plaintext Ki values, but this could potentially weaken the Milenage security of your system.