Rerating CDRs in CGrateS

21/03/2025CGrateS, VoIPCGratesNick

There’s a bunch of reasons you might want to re-rate CDRs in CGrateS.

For me recently I wanted to introduce StatS to process historical CDR data, and I’d messed up some rates and wanting to correct it without deleting the existing data.

We can re-rate CDRs with the *rerate flag like so:

{
  "method": "CDRsV1.RateCDRs",
  "params": [
    {
        "Flags": ["*rerate", "*cdrs"],
        "SetupTimeStart": "2024-01-01 00:00:00",
        "SetupTimeEnd": "2024-01-05 00:00:00",
        "Tenants": ["cgrates.org"],
        "Categories": ["call"]
    }
  ],
  "id": 0
}

Something to be aware of that’s tripped me up, is that if while re-rating any of the CDRs fails, CGrateS will stop rating the CDRs after it. For example if you get something like this:

{'method': 'CDRsV1.RateCDRs', 'params': [{'Flags': ['*rerate', '*stats'], 'SetupTimeStart': '2025-01-01 00:00:00', 'SetupTimeEnd': '2025-01-28 23:59:59', 'Limit': 10}]}


{'error': 'SERVER_ERROR: PARTIALLY_EXECUTED', 'id': None, 'result': None}

Then the full list of CDRs you’ve requested to be re-rated won’t have been re-rated, only the CDRs up to the error, then CGrateS will stop processing the records after it.

So keep an eye on ngrep and make sure you’ve got all your rates and destinations defined correctly, I found putting:

{
  "method": "CDRsV1.RateCDRs",
  "params": [
    {
        "Flags": ["*rerate", "*cdrs"],
        "SetupTimeStart": "2024-01-01 00:00:00",
        "SetupTimeEnd": "2024-01-05 00:00:00",
        "Tenants": ["cgrates.org"],
        "Categories": ["call"],
        "NotCosts" : [-1, 0]
    }
  ],
  "id": 0
}

To filter out any CDRs with a -1 cost in the CDR filters means I filter out any CDRs that had failed to rate last time (Assuming you don’t want to fix CDRs that have failed to get rated).

Call forwarding in SS7/ISUP

14/03/2025GSM, History, Mobile Networks, Notes, RFCs & Standards, VoIPForwarding, ISUP, Redirect, SS7Nick

Had an interesting fault come across my desk the other day; calls were failing when the called party (an SSP we talk to via SS7/ISUP) had an exchange based call forward in place.

In SIP, we can do call forwarding one of two ways, we can send a 302 Redirect or we can generate a new SIP invite.

But in ISUP how is it done?

We’re a SIP based network, but we do talk some SS7/ISUP on the edges, and it was important that we handled this correctly.

I could see in the Address Complete Message (ACM) sent back to our network that there was redirection information here:

We would see the B party SSP release the call as soon as it sent this.

This made me wonder if we, as the originating network, were supposed to redirect to the new B party and send a new Initial Address Message?

After a lot of digging in the ITU Q.7xx docs (I’m not where near as fast at finding information in specs written prior to my birth, than I am with the 3GPP specs) I found my answer – These headers are informational only, the B party SSP is meant to re-target the message, and send us an Alerting or Answer message when it’s done so.

StatS in CGrateS

07/03/2025CGrateS, Software, VoIPCGrates, StatSNick

The StatS subsystem allows us to calculate statistics based on CGrateS events.

Each StatS object contains one or more “metrics” which are things like Average call duration, Total call duration, Average call cost or totals and average of other fields.

The first thing we’ll need to do is enable stats in our JSON config file:

"stats": {
	"enabled": true,
	"string_indexed_fields": ["*req.Account","*req.RunID","*req.Destination"],
},

With that done we’re ready to create our first StatS entry, this one is pretty much a burger-with-the-lot, so let’s take a look:

{
    "method": "APIerSv1.SetStatQueueProfile",
    "params": [
        {
            "ID" : "StatQueueProfile_VoiceStats",
            "QueueLength": 10000000,
            "TTL": -1,
            "MinItems": 0,
            "FilterIDs": [],
            "Metrics": [
                {"FilterIDs": [],"MetricID": "*tcd"},
                {"FilterIDs": [],"MetricID": "*tcc"},
                {"FilterIDs": [],"MetricID": "*asr"},
                {"FilterIDs": [],"MetricID": "*acd"},
                {"FilterIDs": [],"MetricID": "*ddc"}
            ],
            "Stored": True,
        }
    ]
}

So what have we just done?

Well we’ve created a StatQueueProfile named StatQueueProfile_VoiceStats, in which we’ll store a maximum of 10000000 datapoints (this is important because to calculate an average we need to know all the previous datapoints), for a maximum of forever (Because TTL is -1, if we wanted to store for 1 hour we’d set TTL to 1h.

We’re not matching any FilterIDs, but based on what we covered on the post in FilterS, you can imagine using this to match calls from a given Account / customer, or to a specific group of destinations, or maybe from a given supplier, etc, etc.

What we do have that’s interesting is we have defined a series of metrics.

The docs page of CGrateS explains all the available metrics and what they mean (we’ve also mapped them in the CGrateS UI), but the ones I’ve included above are Total Call Duration (*tcd), Total Call Cost (*tcc), Answer Seizure Ratio (*asr), Average Call Duration (*acd) and Distinct Destination Count (*ddc).

So what happens if we now generate a bunch of calls? Well, for starters as we’ve got no FilterS defined here, every call will match this StatQueueProfile, and so we’ll collect data for each.

The example code I’ve provided in the repo for this post generates a bunch of calls, and we can check the values for all our Metrics with GetQueueStringMetrics for our :

{'method': 'StatSv1.GetQueueStringMetrics', 'params': [{'Tenant': '', 'ID': 'StatQueueProfile_TalkTime', 'APIOpts': {}}], 'id': 11}
{'error': None,
 'id': 11,
 'result': {'*acd': '8m4.4s',
            '*asr': '100%',
            '*ddc': '50',
            '*tcc': '5396',
            '*tcd': '6h43m40s'}}

We can now see the values of each metric.

If we’ve got a TTL set, old values that have existed in the QueueProfile longer than the TTL are removed, but we can also manually clear the values by using the ResetStatQueue endpoint:

{"method":"StatSv1.ResetStatQueue","params":[{"Tenant":"cgrates.org","ID":"StatQueueProfile_TalkTime"}],"id":4}

Which resets all the values back to zero / null.

One thing to keep in mind is you can’t modify a StatQueue object via the API without resetting the values.

`string_indexed_fields` in the config file

Sidebar on this – By specifying the string_indexed_fields means that CGrateS will not evaluate every field against Filter rules, but instead only those defined here. This means if you’ve got an event with say 20 fields (AnswerTime, Account, Subject, Destination, RunID, SetupTime, Extra Fields, etc, etc) each of these gets evaluated against a filter, which is pretty processor intensive if your FilterS only ever look at Account and Destination, so by specifying which fields are indexed here to only the fields you use in your filters, you can boost your performance. On the flip side, you can leave this blank to evaluate all fields, but you’ll take a performance hit by doing so.

Western Electric 1013 Test Set / Butt Set

03/03/2025HistoryButt Set, History, Rotary, Telecommunications History, Test Phone, Western ElectricNick

I recently picked up a Western Electric 1013 Test set (Aka Buttinski) rotary test phone.

These are about $10 a piece on eBay in the US, and when having a pile of other stuff sent over (*cough* Nortel Millennium *cough*) I figured I’d add one of these.

I imagine these were produced in massive numbers, they’re electrically very simple, hardened and feature a rubber strip for a “more secure hands-free operation” – Luxury.

Electrically these are very simple, and it’s 3 screws to open the whole unit up, and the top and bottom half separate with spring loaded contacts for the dial so you don’t need to unplug anything (I imagine because they had a habit of getting broken dials when being smashed around).

The amazing Telephone Collectors International library has the overview of these phones, complete with the great images.

Oddly there is no ringer circuit, bell or lamp, so although you could answer an incoming call with one of these, there’s no way you’d know it was ringing, which reminded me of this sketch from “Not the Nine O’Clock News”.

According to the datasheet these phones feature an “type 11C dial”.

TCI library also has the docs for the Type 10 and Type 11 dial, which, according to Ma Bell, is not field serviceable, and should be swapped out rather than attempting a repair.

Rotary dial in action

Alas the rotary dial on mine was running slightly slow, I’ve a feeling Western Electric doesn’t manufacture these any more, so I decided I’d have to fix it myself.

So I stripped down the dial and gave it a good clean.

The dial has a neat little rubber boot on the inside to protect it from gunk, and came apart and went back together easily enough, even if I did inadvertently let out all the spring tension and have to wind it back in, and put the dial on offset by 90 degrees.

Oddly the finger stop moves when you dial – I thought this was an issue with a loose part, but it’s by design, to allow the dial to be more compact, which makes total sense as if I had stopped it from moving I wouldn’t be able to dial higher numbers – Glad I worked that one out eventually.

With the dial cleaned up and adjusted, she’s dialing within spec.

I’ll give the rest of the orange plastic body a polish and it’s off to join the other butt sets.

Setting up TR-069 to manage Calix Endpoints

28/02/2025NotesACS, Calix, CMSNick

Recently one of our customers who’s got a large number of Calix E7 ONTs needed some help to automate some of the network management tasks to do with the CPEs.

We’d setup an TR-069 Auto Configuration Server (ACS) for the Calix RGs (The modems) so that we could manage the config parameters on the devices.

Setup was suprisingly easy, after installing some god-awful 90’s Java stuff to access Calix’s “CMS” we pointed everything at our ACS (Per screenshot below) and presto, a few thousand CPEs were there ready to be centrally managed.

FilterS in CGrateS

21/02/2025CGrateS, VoIPCGratesNick

FilterS do what it says in the name, they are a generic way to define filter rules that an event may or may not match.

Think of them as like a WHERE statement in SQL, they allow us to condition match.

So what would we use FilterS for? Well, let’s first checkout some example use cases:

We might want to provide 100 free minutes on Tuesdays, we know from this post on creating Balances in CGrateS how to create the balance, but we’d use FilterS to make sure the balance is only used on the Tuesday, by adding a filter to check for the day of the week to only match on a Tuesday.

We might define an Attribute to rewrite the Destination number into E.164, but we only want to apply that transformation if the number is in 0NSN format, we apply the translation with AttributeS but we would create a filter to match Destinations that match the given prefixes.

We might want to trigger a counter for calls where the duration of the call (Usage) is greater than 1 hour, we can do this with Thresholds to handle the counting and FilterS to only match if the call duration is greater than 1 hour.

A customer may have multiple DIDs / phone numbers they present as the From header, and we need a way to map phone number “99990001” through to “99990099” as the Account and change the Account to “Customer X”, we can do that with AttributeS to update the Account value in the request, and FilterS to control if that AttributeS rule is matched or not.

FilterS are used all over in CGrateS, if you’ve been following along, you’ve already come across FilterS in the FilterIDs fields in the API, which we’re going to look at using today.

There’s two ways to handle Filters inside CGrateS, they both act the same way but each have some pros and cons.

Inline Filters

The first option is an “inline” filter. Take for example this AttributeS rule using an inline Filter.

{
    "method": "APIerSv2.SetAttributeProfile",
    "params": [{
        "ID": "ATTR_Blog_Example_Inline",
        "Contexts": ["*any"],
        "FilterIDs": ["*string:~*req.Account:Nick"],
        "Attributes": [
            {
            "FilterIDs": [],
            "Path": "*req.ExamplePath",
            "Type": "*constant",
            "Value": "ExampleValue"
            }
        ],
        "Blocker": False,
        "Weight": 10
    }],
}

Let’s break down this filter,

"*string:~*req.Account:Nick"

A filter is made up of 3 components, the match “type”, the element to compare using the match and the values.

Match Types: The above example is matching based on it being a string (match type *string), but we can also match on prefixes, suffixes, destinations, empty, not equal to something, greater than, less than, timings and more.

Match Elements: Next up we’ve got the element, this is what part of a CGrateS event we’re matching with the Match Type we’ve selected. In the above example we’re matching for if the value is type *string and the Element is ~*req.Account. If you look at the requests (~*req.) in CGrateS, you can see the events, there’s all the standard fields like Account, Subject, Category, Tenant, Destination, etc, plus any custom ones you’re using, all of which we can use as an element to compare with our match type.

Match Values: Lastly we’ve got the conditions we’ll match on, in the example above it’s the string “Nick” – So what we’re checking is the match is *string and the element we’re getting the string from is ~*req.Account and if that matches the value “Nick” then ding-ding-ding- we’ve matched.

Obviously the values change based on what we’re doing, if we were prefix matching, we’d put the prefix to match in the value.

Value can also be a list, separated by the pipe (|) symbol for inline filters, so for example we could match “Nick” and also “Nicholas” (if I’m in trouble) with this inline filter:

"*string:~*req.Account:Nick|Nicholas"

Let’s look at a few more inline filters.

This filter will match any event where the Destination is one of ACMA’s fake phone number ranges:

"*prefix:~*req.Destination:6125550|6127010|6135550|6137010|6175550|6187010|6185550|6187010|61491570"

Each match element also has an inverse, for example, *prefix also has *notprefix for matching the reverse:

"*notprefix:~*req.Destination:6125550|6127010|6135550|6137010|6175550|6187010|6185550|6187010|61491570"

Let’s look at one more example, if the Usage is greater than 1 hour:

*gt:~*req.Usage:1h

Inline filter for any Australian E164 prefixes

FilterProfiles

Now we’ve covered the basics of creating Filters with the “Inline” method, let’s consider the limits of this.

If I had defined objects in AttributeS, ThresholdS, ResourceS, Balances and StatS to match when ~*req.Account is “Nick” using an inline filter, and then I change my name, I’d have to go to each of those elements and update them, and that’d be a pain (especially because I’d need to also change my domain name.)

Instead I can create a “Filter Profile” – A reference to a filter that I can reference from AttributeS, ThresholdS, ResourceS, Balances and StatS, and then I only need to update the Filter.

Let’s look at how that would look, first we’d create a new Filter Profile object using the API with:

{
    "method": "ApierV1.SetFilter",
    "params": [
        {
            "ID": "Filter_ACCOUNT_Nick",
            "Rules": [
                {
                    "Type": "*string",
                    "Element": "~*req.Account",
                    "Values": [
                        "Nick",
                        "Nicholas",
                    ]
                }
            ],
            "ActivationInterval": {}
        }
    ]
}

This is the same as the below inline Filter, like the inline filter it’ll match any time the ~*req.Account is a string that matches “Nick” or “Nicholas”

"*string:~*req.Account:Nick|Nicholas"

And then to update our AttributeS example from earlier, rather than defining the inline filter in the FilterIDs section, we just put the ID of the filter we created above:

{
    "method": "APIerSv2.SetAttributeProfile",
    "params": [{
        "ID": "ATTR_Blog_Example_Inline",
        "Contexts": ["*any"],
        "FilterIDs": ["Filter_ACCOUNT_Nick"],
        "Attributes": [
            {
            "FilterIDs": [],
            "Path": "*req.ExamplePath",
            "Type": "*constant",
            "Value": "ExampleValue"
            }
        ],
        "Blocker": False,
        "Weight": 10
    }],
}

Easy!

We saw in the example above that we could do Logical OR operations, if the Account is equal to “Nick” or “Nicholas”. But what one neat thing we can do with FilterProfiles is to do local AND Operations.

Let’s create a new FilterProfile called Filter_Sunday to match when the AnswerTime matches Timing named “Timing_Sunday”:

{
    "method": "ApierV1.SetFilter",
    "params": [
        {
            "ID": "Filter_Sunday",
            "Rules": [
                {
                    "Type": "*timings",
                    "Element": "~*req.AnswerTime",
                    "Values": ["Timing_Sunday"]
                }
            ],
        }
    ]
}

Now we can define an Attribute that will only match if the Account is equal to “Nick” or “Nicholas” AND the AnswerTime matches our “Timing_Sunday” timing profile:

{
    "method": "APIerSv2.SetAttributeProfile",
    "params": [{
        "ID": "ATTR_Blog_Example_Inline",
        "Contexts": ["*any"],
        "FilterIDs": [
             "Filter_ACCOUNT_Nick",
             "Filter_Sunday",
        ],
        "Attributes": [
            {
            "FilterIDs": [],
            "Path": "*req.FYI",
            "Type": "*constant",
            "Value": "Sunday_and_Nick_or_Nicholas"
            }
        ],
        "Blocker": False,
        "Weight": 10
    }],
}

So we can evaluate as AND by just putting both FilterProfiles in the FilterIDs field:

"FilterIDs": ["FLTR_X", "FLTR_Y"],

It’s up to you where you use Inline Filters vs Filter Profiles. As a general rule, if you don’t mind setting it on every object you’re touching, or you don’t reuse the Filter much, inline Filters is probably the way to go.
But if you use multiple subsystems and want to keep your logic more readable, perhaps use Filter Profiles – but again, there’s no hard rules.

Filter Profiles is something we’ve got fairly good coverage of in the CGrateS UI, but as far as I’m aware there’s not a simple “Test Filter” API endpoint, so generally I test out with AttributeS.

Basic CAMEL Charging Flow

14/02/2025GSM, Mobile Networks, Notes, RFCs & StandardsCAMEL, Charging, GSM, MAP, OCS, Roaming, SS7Nick

CAMEL handles charging in 2G and 3G networks, much like Diameter handles charging in LTE.

CAMEL runs on top of SS7, specifically it sits on top of TCAP, which sits on top of SCCP, which can ride on M3UA or MTP3 (so it sits at the same layer as MAP).

CAMEL is primarily focused on charging for Voice & SMS services, as data generally uses Diameter, so it’s voice and SMS we’ll focus on.

CAMEL is spoken between the MSC (gsmSSF) and the OCS (gsmSCF).

Basic Call State Model

CAMEL is closely related to the Intelligent Network stuff on the 1980s, and steals a lot of it’s ideas from there, unfortunately if you’re to read the CAMEL standard it also implies you were involved in IN stuff and had been born at that point, alas I was neither.

So the key to understanding CAMEL is the Basic Call State Model (BCSM) which is a model of all the different states a call can be in, such as ringing, answered, abandoned, call failed, etc, etc.

Over CAMEL, our OCS can be told by the MSC when a certain event happens; the MSC can tell the OCS, that the call has changed state. For example a BCSM event might indicate the call has hung up, is ringing, cancelled, etc.

Below is the list of all the valid BCSM states:

Basic MO Call with CAMEL

Our subscriber makes an outbound call.

Based on the data the MSC has in it from the HLR, it knows that we should use CAMEL for this call, and it has the SCCP Address of the OCS (gsmSCF) it needs to send the CAMEL messages to.

So the MSC sends an InitialDP message to the OCS (via it’s Global Title Address) to Authorize the call that the user is trying to make.

This is like any other Authorization step for an OCS, which allows the OCS to authorize the call by checking the subscriber is valid, check if they’re allowed to call that destination and they’ve got the balance to do so, etc.

The initialDP (Initial Detection Point) is telling our OCS all about the call event that’s being requested, who’s calling, what number they’ve dialed, where they are in the network (of note especially if they’re roaming), etc, etc.

The OCS runs through it’s own checks to see if it wants to allow the call to proceed by checking if the subscriber has got enough balance, unit reservation, etc, etc, and if it does, the OCS sends back a Continue message to the MSC to allow the call to continue.

Generally the OCS also uses this message as a chance to subscribe to BCSM Events using RequestReportBCSMEventArg so the OCS will get notified by the MSC when the state of the call changes. This means the MSC will tell us when the state of the call changes; events like the call getting answered, disconnected, etc. This is critical so we know when the call gets answered and hung-up, so we can charge correctly.

In the below example, as well as sending the Continue and RequestReportBCSMEventArg the OCS is also setting the ChargingArgs for this call, so the MSC knows who to charge (the caller) set via sendingSide and that the MSC must send an Apply Charging Report (ACR) messages every 300 units (1 unit = 100 ms, so a value of 300 = 300 x 100 milliseconds = 30 seconds) so the OCS keeps track of what’s going on.

`continue` sent by the OCS to the MSC, also including `reportBCSMEvent` and `applyCharging` messages

At this point the call can start to proceed – In ISUP terms the InitialDP is triggered between the Initial Address Message and the Address Complete message is sent after the continue is sent back.

Or in a slightly less appropriate analogy but easier to understand for SIP folks, the InitialDP is sent for INVITE and the 180 RINGING is sent once the continue message is received.

Call is Answered

So at this stage our call can start to ring.

As we’ve subscribed to BCSM events in our last message, the MSC is going to tell us when the call gets answered or the call times out, is abandoned or the sun burns out.

The MSC provides this info a eventReportBCSM, which is very simple and just tells us the event that’s been triggered, in the example below, the call was answered.

These eventReportBCSM are informational from the MSC to the OCS, so the OCS doesn’t need to send anything back, but the OCS does need to mark the call as answered so it can start timing the call.

At this stage, the call is connected and our two parties are talking, but our MSC has been told it needs to send us applyChargingReports every 30 seconds (due to the value of 300 in maxCallPeriodDuration) after the call was connected, so the MSC sends the OCS it’s first applyChargingReport 30 seconds after the call was answered:

applyChargingReport sent by the MSC to the OCS every reporting period

We can calculate the duration of the call so far based on the time of the eventReportBCSM, then the OCS must make a decision of if it should allow the call to continue or not.

For simplicity’s sake, let’s imagine we’re still got a balance in the OCS and the OCS wants the call to continue, the OCS send back an applyCharging message to the MSC in response, and includes the current allowed maxCallPeriodDuration, keeping in mind the value is x100 and in nanoseconds (so this is 30 seconds).

`applyCharging` from the OCS back to the MSC

Perfect, our call is good to go for another 30 more seconds, son in 30 seconds we’ll get another ACR messages from MSC to the OCS to keep it abreast of what’s going on.

Now one of two things is going to happen, either subscriber is going to burn through all of their minutes, and get their call cutoff, or the call will end while they’ve still got balance, let’s look at both scenarios.

Normal Hangup Scenario

When the call ends, we get an applyChargingReport from the MSC to the OCS.

As we’ve subscribed to reportBCSMEvent we get both the applyChargingReport with legActive: False` so we know the call has hungup, and we’ve got an event report to tell us more about the event, in this case a hangup from the Originating Side.

`reportBCSMEvent` and `applyChargingReport` Sent by the MSC to the OCS to indicate the call has ended, note the `legActive` flag is now false

Lastly the OCS confirms by sending a releaseCall to the MSC, to indicate all legs should now terminate.

`releaseCall` Sent by OCS to MSC at the very end

So that’s it!

Obviously there are other flows, such as running out of balance mid-call, rejecting a call, SMS and PBX / VPN services that rely on CAMEL, but hopefully you now understand the basics of how CAMEL based charging looks and works.

If you’re looking for a CAMEL capable OCS or a CAMEL to Diameter or API gateway, get in touch!

CGrateS time Metas

07/02/2025Notes, SoftwareCGratesNick

There are so many ways you can format time for things like Expiry or ActionPlans in CGrateS, this is mostly just a quick reference for me:

*asap (Now)
*now
*every_minute
*hourly
*monthly
*monthly_estimated
*yearly
*daily
*weekly
mo+1h2m
*always (What?)
*month_end
*month_end+1h2m
+20s
1375212790
+24h
2016-09-14T19:37:43.665+0000
20160419210007.037
31/05/2015 14:46:00
08.04.2014 22:14:29
20131023215149
“2013-12-30 15:00:01 +0430 +0430”
“2013-12-30 15:00:01 +0000 UTC”
“2013-12-30 15:00:01”

Stolen from: https://github.com/cgrates/cgrates/blob/8fec8dbca1f28436f8658dbcb8be9d03ee7ab9ee/utils/coreutils_test.go#L242

Enabling logging on Cisco ITP Signaling Transfer Point

31/01/2025GSM, Mobile Networks, NotesCisco, ITP, SS7, STPNick

Mostly just for my own notes, but when debugging SCCP translation on a Cisco ITP STP, this is probably obvious for folks who are more Cisco focused:

Enabling debug:

debug cs7 m3ua packet
debug cs7 m3ua all
debug cs7 sccp event ALL
debug cs7 sccp gtt-accounting
terminal monitor

Disabling debug:

no debug cs7 m3ua packet
no debug cs7 m3ua all
no debug cs7 sccp event ALL
no debug cs7 sccp gtt-accounting

GTPv2 Source Ports

24/01/2025EPC, LTE, Mobile Networks, Notes, RFCs & StandardsGTP, GTP-C, GTPv2CNick

Ask anyone in the industry and they’ll tell you that GTPv2-C (aka GTP-C) uses port 2123, and they’re right, kinda.

Per TS 129.274 the Create Session Request should be sent to port 2123, but the source port can be any port:

The UDP Source Port for a GTPv2 Initial message is a locally allocated port number at the sending GTP entity.

So this means that while the Destination Port is 2123, the source port is not always 2123.

So what about a response to this? Our Create Session Response must go where?

Create Session request coming from 166.x.y.z from a random port 36225
Going to the PGW on 172.x.y.z port 2123

The response goes to the same port the request came on, so for the example above, as the source port was 36225, the Create Session Response must be sent to port 36225.

Because:

The UDP Destination Port value of a GTPv2 Triggered message and for a Triggered Reply message shall be the value of the UDP Source Port of the corresponding message to which this GTPv2 entity is replying, except in the case of the SGSN pool scenario.

But that’s where the association ends.

So if our PGW wants to send a Create Bearer Request to the SGW, that’s an initial message, so must go to port 2123, even if the Create Session Request came from a random different port.

CGrateS in Baby Steps – Part 5 – Events, Agents & Subsystems

17/01/2025CGrateS, Software, VoIPCGrates, Diameter, OCS, Online ChargingNick

Up until this point in the series, I’ve tried to hide all the complexity of CGrateS, so people following along can see some progress and feel like they’re making it somewhere with CGrateS, but it’s time to tear off the plaster and talk about the actual concepts, about what’s under the hood, and how all the components interact, as it’ll make it much easier then for us to learn more about how to use CGrateS.

This will be the last post in the “CGrateS in Baby Steps” series (Which I started in 2022), if you’ve made it this far congratulations, all the future posts will be on specific topics and build upon the concepts we’ve covered here.

This took me a while to grasp – CGrateS is both crazy complex and beautifully simple, but getting to the stage where you can “see through the matrix” on CGrateS and see the beautiful simplicity involves a bit of understanding how everything fits together.

Once you realize once you can see the pattern, and understand the building blocks, everything else CGrateS related becomes super simple.

Agents

in CGrateS Agents are consumers of the services. That’s a super generic answer, but let’s take a closer look at what that actually means with some examples:

Diameter is a protocol that can be used for Online Charging.
CGrateS has a common interface for API calls that can perform Online Charging.
The CGrateS Diameter Agent translates between Diameter on one side, and CGrateS API calls on the other.

Likewise, if we want to speak Radius, we can use the CGrateS Radius Agent, this translates between RADIUS and the CGrateS API calls.

FreeSWITCH, Asterisk and Kamailio don’t use specific protocols like Diameter or Radius, but rather modules or plugins to connect that application to a CGrateS Agent, and they all just end up talking the same CGrateS API calls.

Lastly, there’s even an HTTP agent so you could define your own agent to talk another protocol if you wanted to use CGrateS for anything else (We’ve been playing with CAMEL based charging with CGrateS and 5GC charging).

The config for each of the CGrateS Agents happens in the cgrates.json config file (Typically in /etc/cgrates).

Because the Agents just translate everything into API calls, logic for billing a call from FreeSWITCH is the same as for Diameter, the same as for RADIUS, the same as for SIP, the same as for Asterisk.

The Agents just translate all the domain-specific stuff into the common CGrateS RPC API, which we’ve been working with up to this point.

This is key part to understand; because once you understand how to do the CGrateS part, moving from Asterisk to FreeSWITCH, to DNS, to RADIUS, to any other Agent, it’s all the same to you.

The Agents just translate domain-specific stuff (Diameter requests, CSV files, Asterisk Calls, FreeSWITCH calls, etc, etc) and act as a translator to translate these requests into CGrateS RPC API calls.

Subsystems

So with these API calls, where do they go, what do they do?

Well, it’s the Subsystems that do the things.

What things?

Well, everything of use.

Each subsystem has a purpose, AttributeS transforms stuff, EeS exports CDRs, RALs applies our charging logic, CDRs writes CDRs to StorDB, etc, etc.

In each event we can set flags to denote which subsystems it should be routed to, and we can set the links between components in our cgrates.json file.

Based on the flags, we pass events between these subsystems.

Events

So our Agents create the API calls, which contain Events, which are JSON RPC calls.

They look like all the API examples we’ve played with, because that’s exactly what they are.

We can access them via the JSON RPC API, but when you start a call on Kamailio, the Kamailio Agent generates a JSON RPC API call containing an Event into CGrateS for that call on Kamailio.

When you send a DNS request, the DNS Agent translate this DNS request into a CGrateS JSON RPC API containing the event for the DNS request.

Let’s take an example, we’re going to use the ErS as it’s the simplest to demonstrate with.

I’ve already written about ERS the Event Reader Service, and it reads text files / CSVs so we can import them into CGrateS.

So if you setup your enviroment per the tutorial above (but don’t load the CSV yet), we’ll start running some experiments…

Anatomy of an Event

We can “sniff” the events bouncing around between the Agent and the various Subsystems in real time, by using ngrep:

sudo ngrep -t -W byline port 2012 or port 2080 or port 8021 or port 2014 or port 2053 -d any

So let’ we’ve got ngrep running, we can move our CSV file in to be processed in another tab.

Plonking the CSV file into the path ERS is monitoring will mean the ErS Agent will generate a CGrateS JSON RPC “event” for each row in the file, it’ll look something like this:

#############
T 2024/12/22 09:09:47.357151 127.0.0.1:50456 -> 127.0.0.1:2012 [AP] #404
{"method":"CDRsV1.ProcessEvent","params":[{"Flags":["*rals"],"Tenant":"cgrates.org","ID":"c2ce33d","Time":"2024-12-22T09:09:47.356294838+11:00","Event":{"Account":"61412341234","Animal":"Dog","CGRID":"6330859b7c38c1d508f9e5e0043950079e54fef1","Category":"call","Destination":"61812341234","OriginID":"1lklkjfds","RequestType":"*rated","SetupTime":"2024-01-01 01:00:00","Source":"*sessions","Subject":"61812341234","ToR":"*voice","Usage":"60","Value0":"2024-01-01 01:00:00","Value1":"2024-01-01 01:01:00","Value2":"Nick","Value3":"60","Value4":"61412341234","Value5":"61812341234","Value6":"Dog","Value7":"1lklkjfds"},"APIOpts":{}}],"id":1}

##
T 2024/12/22 09:09:47.357427 127.0.0.1:50456 -> 127.0.0.1:2012 [AP] #406
{"method":"AttributeSv1.ProcessEvent","params":[{"Tenant":"cgrates.org","ID":"c2ce33d","Time":"2024-12-22T09:09:47.356294838+11:00","Event":{"Account":"61412341234","Animal":"Dog","CGRID":"6330859b7c38c1d508f9e5e0043950079e54fef1","Category":"call","Destination":"61812341234","OriginID":"1lklkjfds","RequestType":"*rated","SetupTime":"2024-01-01 01:00:00","Source":"*sessions","Subject":"61812341234","ToR":"*voice","Usage":"60","Value0":"2024-01-01 01:00:00","Value1":"2024-01-01 01:01:00","Value2":"Nick","Value3":"60","Value4":"61412341234","Value5":"61812341234","Value6":"Dog","Value7":"1lklkjfds"},"APIOpts":{"*context":"*cdrs","*subsys":"*cdrs"}}],"id":2}

##
T 2024/12/22 09:09:47.422623 127.0.0.1:2012 -> 127.0.0.1:50456 [AP] #408
{"id":2,"result":null,"error":"NOT_FOUND"}

##
T 2024/12/22 09:09:47.422788 127.0.0.1:50456 -> 127.0.0.1:2012 [AP] #410
{"method":"ChargerSv1.ProcessEvent","params":[{"Tenant":"cgrates.org","ID":"c2ce33d","Time":"2024-12-22T09:09:47.356294838+11:00","Event":{"Account":"61412341234","Animal":"Dog","CGRID":"6330859b7c38c1d508f9e5e0043950079e54fef1","Category":"call","Destination":"61812341234","OriginID":"1lklkjfds","RequestType":"*rated","SetupTime":"2024-01-01 01:00:00","Source":"*sessions","Subject":"61812341234","ToR":"*voice","Usage":"60","Value0":"2024-01-01 01:00:00","Value1":"2024-01-01 01:01:00","Value2":"Nick","Value3":"60","Value4":"61412341234","Value5":"61812341234","Value6":"Dog","Value7":"1lklkjfds"},"APIOpts":{"*context":"*cdrs","*subsys":"*cdrs"}}],"id":3}

##
T 2024/12/22 09:09:47.451702 127.0.0.1:2012 -> 127.0.0.1:50456 [AP] #412
{"id":3,"result":[{"ChargerSProfile":"DEFAULT","AttributeSProfiles":null,"AlteredFields":["*req.RunID"],"CGREvent":{"Tenant":"cgrates.org","ID":"c2ce33d","Time":"2024-12-22T09:09:47.356294838+11:00","Event":{"Account":"61412341234","Animal":"Dog","CGRID":"6330859b7c38c1d508f9e5e0043950079e54fef1","Category":"call","Destination":"61812341234","OriginID":"1lklkjfds","RequestType":"*rated","RunID":"DEFAULT","SetupTime":"2024-01-01 01:00:00","Source":"*sessions","Subject":"61812341234","ToR":"*voice","Usage":"60","Value0":"2024-01-01 01:00:00","Value1":"2024-01-01 01:01:00","Value2":"Nick","Value3":"60","Value4":"61412341234","Value5":"61812341234","Value6":"Dog","Value7":"1lklkjfds"},"APIOpts":{"*context":"*cdrs","*subsys":"*chargers"}}}],"error":null}

#
T 2024/12/22 09:09:47.452021 127.0.0.1:50456 -> 127.0.0.1:2012 [AP] #413
{"method":"Responder.GetCost","params":[{"Category":"call","Tenant":"cgrates.org","Subject":"61812341234","Account":"61412341234","Destination":"61812341234","TimeStart":"2024-01-01T01:00:00+11:00","TimeEnd":"2024-01-01T01:00:00.00000006+11:00","LoopIndex":0,"DurationIndex":60,"FallbackSubject":"","RatingInfos":null,"Increments":null,"ToR":"*voice","ExtraFields":{"Animal":"Dog","Value0":"2024-01-01 01:00:00","Value1":"2024-01-01 01:01:00","Value2":"Nick","Value3":"60","Value4":"61412341234","Value5":"61812341234","Value6":"Dog","Value7":"1lklkjfds"},"MaxRate":0,"MaxRateUnit":0,"MaxCostSoFar":0,"CgrID":"","RunID":"","ForceDuration":false,"PerformRounding":true,"DenyNegativeAccount":false,"DryRun":false,"APIOpts":{"*context":"*cdrs","*subsys":"*chargers"}}],"id":4}

#
T 2024/12/22 09:09:47.465711 127.0.0.1:2012 -> 127.0.0.1:50456 [AP] #414
{"id":4,"result":{"Category":"call","Tenant":"cgrates.org","Subject":"61812341234","Account":"61412341234","Destination":"61812341234","ToR":"*voice","Cost":14,"Timespans":[{"TimeStart":"2024-01-01T01:00:00+11:00","TimeEnd":"2024-01-01T01:01:00+11:00","Cost":14,"RateInterval":{"Timing":{"ID":"*any","Years":[],"Months":[],"MonthDays":[],"WeekDays":[],"StartTime":"00:00:00","EndTime":""},"Rating":{"ConnectFee":0,"RoundingMethod":"*up","RoundingDecimals":4,"MaxCost":0,"MaxCostStrategy":"","Rates":[{"GroupIntervalStart":0,"Value":14,"RateIncrement":60000000000,"RateUnit":60000000000}]},"Weight":10},"DurationIndex":60000000000,"Increments":[{"Duration":0,"Cost":0,"BalanceInfo":{"Unit":null,"Monetary":null,"AccountID":""},"CompressFactor":1},{"Duration":60000000000,"Cost":14,"BalanceInfo":{"Unit":null,"Monetary":null,"AccountID":""},"CompressFactor":1}],"RoundIncrement":null,"MatchedSubject":"*out:cgrates.org:call:*any","MatchedPrefix":"618","MatchedDestId":"Dest_AU_Fixed","RatingPlanId":"RatingPlan_VoiceCalls","CompressFactor":1}],"RatedUsage":60000000000,"AccountSummary":null},"error":null}

#
T 2024/12/22 09:09:47.470035 127.0.0.1:2012 -> 127.0.0.1:50456 [AP] #415
{"id":1,"result":"OK","error":null}

Sidebar – you’re going to spend a lot of time with `ngrep`.

Alright, that event probably looks familiar, after all, it’s the same structure as the API requests we’ve made to CGrateS so far, to set rates and handle accounts.

But what we’re witnessing here isn’t us making an API request to the JSON RPC interface from a Python script, it’s the ERS Agent inside CGrateS, calling CGrateS.

The ERS Agent inside CGrateS reads the CSV file we dropped in, and based on what we had set in the ERS section of the CGrateS config file (cgrates.json), the ERS Agent create JSON RPC events and sent it to CGrateS for processing.

You may be thinking “Wow, the ERS Agent is really dumb, it just sends an API request (events)”, and you’d be right.

We could replace the ERS Agent with a Python script to read the CSV and send the same request, and we’d get the exact same outcome, but CGrateS is mostly “batteries included” so we don’t have to.

Ok, so you’ve heard me drum in the fact that Agents are pretty simple, and all they do is make JSON RPC requests for the event which are sent to CGrateS. So now what happens?

Well, the event is calling CDRsV1.ProcessEvent, so that means the Event is passed by CGRengine to the CDRs subsystem.

What does CDRs subsystem do with it? Well, that’s going to depend on what’s in our cgrates.json config file,

In the above example, CDRs is setup with connections to the different subsystems, AttributeS, Chargers and RALs are all the subsystems linked from here.

Having these links here does not force the Event to always route to these Subsystems, but unless we’ve got the links there, the Event won’t be able to get routed from CDRs to that subsystem if we want it to.

But we can see what’s going to happen with this request based on our CDRsV1.ProcessEvent event, it’s got Flags set to rals, so we know it wants RALs to be called.

Let’s take a closer look at the API call:

{
    "method": "CDRsV1.ProcessEvent",
    "params": [
        {
            "Flags": ["*rals"],
            "Tenant": "cgrates.org",
            "ID": "c2ce33d",
            "Time": "2024-12-22T09:09:47.356294838+11:00",
            "Event": {
                "Account": "61412341234",
                "Animal": "Dog",
                "CGRID": "6330859b7c38c1d508f9e5e0043950079e54fef1",
                "Category": "call",
                "Destination": "61812341234",
                "OriginID": "1lklkjfds",
                "RequestType": "*rated",
                "SetupTime": "2024-01-01 01:00:00",
                "Source": "*sessions",
                "Subject": "61812341234",
                "ToR": "*voice",
                "Usage": "60"
            },
            "APIOpts": {}
        }
    ],
    "id": 1
}

So looking in ngrep we see our CDRsV1.ProcessExternalCDR event makes it to the CDRs module with ID 1.

The API call has flags set to *rals so the CDRs will call RALs , and inside our config the CDRs section has a link in the config (shown in the image below) to RALs (rals_conns) – if we didn’t have that link, CGrateS wouldn’t know how to connect to RALs, and the event would fail.

We’ve also got connections to AttributeS configured in the config and we can see the RPC call to AttributeS.ProcessEvent which is the same as if we were to call it directly via the AttributeS API.

{
    "method": "AttributeSv1.ProcessEvent",
    "params": [
        {
            "Tenant": "cgrates.org",
            "ID": "c2ce33d",
            "Time": "2024-12-22T09:09:47.356294838+11:00",
            "Event": {
                "Account": "61412341234",
                "Animal": "Dog",
                "CGRID": "6330859b7c38c1d508f9e5e0043950079e54fef1",
                "Category": "call",
                "Destination": "61812341234",
                "OriginID": "1lklkjfds",
                "RequestType": "*rated",
                "SetupTime": "2024-01-01 01:00:00",
                "Source": "*sessions",
                "Subject": "61812341234",
                "ToR": "*voice",
                "Usage": "60",
            },
            "APIOpts": {
                "*context": "*cdrs",
                "*subsys": "*cdrs"
            }
        }
    ],
    "id": 2
}

Note at the bottom the APIOpts section tells us this API call was made by the *cdrs subsystem and the ID is 2 (This is a different request to the original CDRsV1.ProcessExternalCDR request which had ID 1 – we can use this to match responses to requests).

Again, because our config also includes links ChargerS and RALS subsystems, we’ll see requests to (you guessed it) ChargerS (The ChargerSv1.ProcessEvent) and RALS (Responder.GetCost).

T 2024/12/22 09:09:47.452021 127.0.0.1:50456 -> 127.0.0.1:2012 [AP] #413
{"method":"Responder.GetCost","params":[{"Category":"call","Tenant":"cgrates.org","Subject":"61812341234","Account":"61412341234","Destination":"61812341234","TimeStart":"2024-01-01T01:00:00+11:00","TimeEnd":"2024-01-01T01:00:00.00000006+11:00","LoopIndex":0,"DurationIndex":60,"FallbackSubject":"","RatingInfos":null,"Increments":null,"ToR":"*voice","ExtraFields":{"Animal":"Dog","Value0":"2024-01-01 01:00:00","Value1":"2024-01-01 01:01:00","Value2":"Nick","Value3":"60","Value4":"61412341234","Value5":"61812341234","Value6":"Dog","Value7":"1lklkjfds"},"MaxRate":0,"MaxRateUnit":0,"MaxCostSoFar":0,"CgrID":"","RunID":"","ForceDuration":false,"PerformRounding":true,"DenyNegativeAccount":false,"DryRun":false,"APIOpts":{"*context":"*cdrs","*subsys":"*chargers"}}],"id":4}

#
T 2024/12/22 09:09:47.465711 127.0.0.1:2012 -> 127.0.0.1:50456 [AP] #414
{"id":4,"result":{"Category":"call","Tenant":"cgrates.org","Subject":"61812341234","Account":"61412341234","Destination":"61812341234","ToR":"*voice","Cost":14,"Timespans":[{"TimeStart":"2024-01-01T01:00:00+11:00","TimeEnd":"2024-01-01T01:01:00+11:00","Cost":14,"RateInterval":{"Timing":{"ID":"*any","Years":[],"Months":[],"MonthDays":[],"WeekDays":[],"StartTime":"00:00:00","EndTime":""},"Rating":{"ConnectFee":0,"RoundingMethod":"*up","RoundingDecimals":4,"MaxCost":0,"MaxCostStrategy":"","Rates":[{"GroupIntervalStart":0,"Value":14,"RateIncrement":60000000000,"RateUnit":60000000000}]},"Weight":10},"DurationIndex":60000000000,"Increments":[{"Duration":0,"Cost":0,"BalanceInfo":{"Unit":null,"Monetary":null,"AccountID":""},"CompressFactor":1},{"Duration":60000000000,"Cost":14,"BalanceInfo":{"Unit":null,"Monetary":null,"AccountID":""},"CompressFactor":1}],"RoundIncrement":null,"MatchedSubject":"*out:cgrates.org:call:*any","MatchedPrefix":"618","MatchedDestId":"Dest_AU_Fixed","RatingPlanId":"RatingPlan_VoiceCalls","CompressFactor":1}],"RatedUsage":60000000000,"AccountSummary":null},"error":null}

What we’re seeing is the CDRs module, calling RALs, to get the cost information for this event.

Finally the CDRsV1.ProcessEvent that was initially sent by ErS gets a result (we can find the result to the request as it’ll have the same id parameter)

So that’s it, that’s the secret sauce – CGrateS is just a bunch of little APIs we combo together to create something great.

Recap

Agents translate data sources into API calls.

Each little API belongs to a Subsystem, like ChargerS, AttributeS or RALs, and we can chain them together in our config file or through the flags in the API request.

Once you’ve got your head wrapped around this, everything in CGrateS becomes way easier.

From now on I’ll pivot to talking about specific modules, and how we use them, starting with AttributeS (which I wrote last year while still drafting this), and diving into how to use each module in more detail.

A tale of two CPRIs

09/01/20255G SA, EUTRAN, GSM, History, LTE, Mobile Networks, RFCs & StandardsOBSAI, OpenRAN, PLMN, RAN, StandardsNick

It was the best of times, it was the worst of times. It was the age of wisdom, it was the age of foolishness. It was the epoch of belief, it was the epoch of incredulity. It was the season of Light, it was the season of Darkness. It was the spring of hope, it was the winter of despair.
A tale of two Cities

When Dickens wrote of Doctor Manette in the 1859, I doubt his intention was to write about the repeating history of RAN fronthaul standards – but I can’t really say for sure.

Setting the Scene

Our story starts with introducing CPRI (Common Public Radio Interface) interface, having been imprisoned in the Bastille of vendor lock in for the better part of twenty years.

Think of CPRI is less of a hard interoperable standard and more like how the Italian and French languages are both derived from Latin; it doesn’t mean that the two languages are the same, but they’ve got the same root and may share some common words and structures.

In practice this means that taking an Ericsson Radio and plugging it into a Huawei Baseband simply won’t work – With CPRI you must use the same vendor for the Baseband and the Radios.

Huawei BBU 3900 Architecture — Image from my post on setting up Huawei Base stations, showing the Huawei Baseband (BBU) connecting to the Huawei Radios (RRUs) via CPRI (in Yellow)

The Unexpected Plot Twist

“Nuts to this” the industry said after being stuck locked between the same radios and baseband for years; we should create a standard so we can mix and match between radio vendors, and even standardize some other stuff that’s been bothering us, so we’ll have a happy world of interoperability.

With kit created that followed this standard, we’d be able to take components from vendor A, B & C, and fit them together like Lego, saving you some money along the way and giving you’ve got a working solution made of “best of breed” components, where everything is interoperable.

*Omnitouch Lego base stations, which also fit together like Lego – Part of the Omnitouch Network Services “swag” from 2024*

So the industry created a group to chart a path for a better tomorrow by standardizing these interfaces.

The group had many industry heavyweights like Nokia, NEC, LG, ZTE and Samsung joining.

The key benefits espoused on their website:

An open market will substantially reduce the development effort and costs that have been traditionally associated with creating new base station product ranges. The availability of off-the-shelf base station modules will enable manufacturers to focus their development efforts on creating further added value within the base station, encouraging greater innovation and more cost-effective products. Furthermore, as product development cycles will be reduced, new base station functions will become available on the market more quickly.
Mission statement of the group

In addition to being able to mix and match radios and basebands from different vendors, the group defined standards for centralized baseband, and interoperable standards, to allow a multi-vendor ecosystem to flourish.

And here’s the plot twist – The text above, was not written about OpenRAN, and it was not written about the benefits of eCPRI.

It was written about Open Base Station Architecture Initiative (OBSAI) and it was written 22 years ago.

*record screech sound*

This image was called "Confused Ernie" but it's clearly Bert...

Standards War you’ve never heard of: OBSAI vs CPRI

When OBSAI was defined it was not without competition; there was another competing fronthaul standard; that’s right, the mustache twirling lowlife from earlier in the story – CPRI.

Supported by Huawei, Nortel, NEC & Ericsson (among others), CPRI took a “gentle parenting” approach to the standards world, in contrast to OBSAI.
Instead of telling all the vendors to agree on an interoperable front haul standard, CPRI just encouraged everyone to implement what their heart told them and what felt right to them.

As it happened, the industry favored the CPRI approach.

If a vendor wanted to add a new “word” in their CPRI “language” to add a new feature, they just went ahead and added it – It didn’t require anyone else to agree with them or changes to a common standard used by the industry, vendors could talk to the kit they made how they wanted.

CPRI has been the defacto non-standard used by all the big kit vendors for the past ~10 years.

The Death of OBSAI & the Birth of OpenRAN’s eCPRI

Why haven’t you heard of OBSAI? Why didn’t the OBSAI standard just serve as the basis for eCPRI – After all the last OBSAI release was less than 5 years before TIP started working on eCPRI publicly.

Did a schism over “uplink performance improvement” options lead to “irreconcilable differences” between parties leading to the breakup of the OBSAI group?

Nope.

Customers (MNOs) didn’t buy OBSAI based equipment in measurably larger quantities than CPRI kit. That’s it.

This meant the vendors invested less in paying teams to further develop the standards, the OBSAI group met less frequently, and in the end, member vendors didn’t bother adding support for OBSAI to new equipment and just used the easier and more flexible CPRI option instead.

At some point someone just stopped paying for the domain renewal and that was it, OBSAI was no more.

This is how the standards body ends, not with a bang, but with a whimper.
T.S. Elliot’s writings on the death of obsai

Those who do not learn from history…

The goals of the OBSAI Group and OpenRAN working groups are almost identical, so what lessons did Marconi, Motorola and Alcatel learn as members of OBSAI that other vendors could learn about OpenRAN strategy?

There are no mentions of OBSAI in any of the information published by OpenRAN advocates, and I’m wondering if folks aren’t aware that history tends to repeat and are ignorant to what came before it, or they’re just not learning lessons from the past?

So what can the OpenRAN industry learn from OBSAI?

Being a nerd, I started detailing the technical challenges, but that’s all window dressing; The biggest hurdle facing CPRI vs eCPRI are the same challenges OBSAI vs CPRI faced a decade prior:

To be relevant, OpenRAN kit has to be demonstrably better than what we have today AND provide a tangible cost saving.

OBSAI failed at achieving this, and so failed to meet it’s other more noble goals.

[At the time of writing this at least] I’d contend that neither of those two criteria have been met by OpenRAN.

What does the future hold for OpenRAN?

Looking into the crystal ball, will OpenRAN and eCPRI go the way of OBSAI, or will someone keep the OpenRAN dream alive?

Today, we’re still seeing the MNOs continue to provide tokenistic investment in OpenRAN. But being a cynic, I’d say the MNOs are feigning interest in OpenRAN products because it’s advantageous for them to do so.

The threat of OpenRAN has proven to be a great stick to beat the traditional vendors with to force them to lower their prices.

Think about the $14 billion USD Ericsson deal with AT&T, if chucking a few million at OpenRAN pilots / trials lead to AT&T getting even a 0.1% reduction in what they’re paying Ericsson, then the numbers would have worked out well in AT&Ts favor.

From the MNOs perspective, the cost to throw the odd pilot or trial to a hungry OpenRAN vendor to keep them on the hook is negligible, but the threat of OpenRAN provides leverage and bargaining power every time it’s contract renewal time with the big RAN vendors.

Already we’ve seen all the traditional RAN vendors move to neutralize this threat by introducing “OpenRAN compatible” equipment and talking up their commitment to openness.

This move by the RAN vendors takes this sting out of the OpenRAN threat, and means MNOs won’t have much reason to continue supporting OpenRAN.

This leaves the remaining OpenRAN vendors like Miss Havisham, forever waiting in their proverbial wedding dresses, having being left at the altar.

Okay, I’m mixing my Dickens’ references here, but it was too good not to.

Appendix

I’ve been enjoying writing more analysis than just technical content, let me know if this is something you’re interested in seeing more of.

I’ve been involved in two big OpenRAN integration projects, both of which went poorly and probably tainted my perspective. Enough time has passed to probably write up how it all went with the vendor names removed, but that’s a post for another time!

If you wanted to learn more about OBSAI Archive.org has their old website available for reading.

Installing Calix CMS Java tool on Ubuntu in 2025

03/01/2025Notes, SoftwareCalix, JavaNick

Ah, another post in my “how to make software work that was made with Java in the 1990s” post, except Calix last updated this software in 2022 – make of that what you will…

This time is Calix Management System (CMS), the Java app for managing equipment in exchanges / COs from Calix.

On Ubuntu 24.04 LTS it requires JRE version 8:

sudo apt install openjdk-8-jre
sudo apt install execstack

With that installed I could install CMS

/install.bin LAX_VM /usr/lib/jvm/java-8-openjdk-amd64/bin/java

Then it came time to run it, I chose to install in my home directory in a folder named “Calix” (default).

First you’ve got to make their startup script executable:

~/Calix$ chmod +x Start\ CMS

Then we need to modify it to point to the openjdk Java 8 binary, the simplest way is to just add the LAX_VM on startup:

~/Calix$ ./Start\ CMS LAX_VM /usr/lib/jvm/java-8-openjdk-amd64/bin/java

And you’re in.

TFTs & Create Bearer Requests

27/12/20245G SA, EPC, IMS / VoLTE, LTE, Mobile Networks, Notes, RFCs & StandardsCharging Rule, Diameter, EPC, Gx, TFT, Traffic Flow TemplateNick

What is included in the Charging Rule on Gx ultimately turns into a Create Bearer Request on GTPv2-C.

But the mapping it’s always obvious, today I got stuck on the difference between a Single remote port type, and a single local port type, thinking that the Packet Filter Direction in the TFT controlled this – It doesn’t – It’s controlled by the order of your Traffic Flow Template rule.

Input TFT:

"permit out 17 from any 50000 to any"

Leads to Packet filter component type identifier: Single remote port type

Whereas a TFT of:

permit out 17 from any to any 50000

Leads to Packet filter component type identifier: Single local port type (64)

Holiday Reading list 2024

20/12/2024NotesBooksNick

As summer reaches full swing in Australia and the level of effort I put into blog posts wains, here’s a lost of books I’m to-read or have read this year.

I can’t imagine a telecom book club being super popular, but if you’ve got any recommendations for good telecom related reads, I’d love to hear them!

The End of Telecoms History – William Webb (Read)

I read this this year, Webb is one of those folks who’s paycheck doesn’t come from shilling hardware, and he’s been pretty good at making accurate predictions and soothsaying, even when what he says upsets some.

The launch of 5G pretty much played out exactly how one of his other books (The 5G Myth) predicted, and the premise of The End of Telecoms History is that if we look at the data which suggest that bandwidth growth will not continue unabated forever, what does that mean?

I’ve a feeling there are a telecom execs quietly reading this book (while making sure that no sees them reading it) and planning for a potential future in a world of enough bandwidth to satisfy demand, and how this would impact their bottom lines and overall business model, even if outwardly everyone still claims the growth will continue forever.

The Iron Wire: A novel of the Adelaide to Darwin telegraph line – Garry Kilworth (Read)

A fun imagined romp about adventures in the bush while connecting a nation in the 18th century, the story is inspired by the real world events but are fictional, it’s a fun way to explore the topic and add bushrangers into the mix.

Rogers v. Rogers: The Battle for Control of Canada’s Telecom Empire – Alexandra Posadzki (Read)

Just finished this; I’ve worked with a lot of operators in the past, both big some small (the best ones are small), and it’s fascinating to understand at a board level how things get done in telecom giants, even if the Rogers’ family aren’t the best example of how to do this…

Chip War: The Fight for the World’s Most Critical Technology – Chris Miller (Read)

Without integrated circuits the telecom industry is back to relays and electromagnetically switching traffic (not that I’m against this).

Miller’s book outlines how we got to our current situation, and how the products coming out of TSMC and SMIC will shape the future of tech at a fundamental level.

How the World Was One – Arthur C Clarke (To Read)

Famed science fiction writer Arthur C Clarke had a penchant for scuba diving and communications (can relate) hence his interest in submarine telephony.

I read “Voice Across the Sea” a few years ago (on an actual paper based book no less!) but this is freely available as an eBook and I’m looking forward to reading it.

Introducing Elixir – Simon St. Laurent & J. David Eisenberg (Reading)

The dev team at Omnitouch are all about Elixir, and being an old dinosaur I figured I should at least learn the basics!

I’m still working my way through the book, having a folder of examples typed out from the book (I can’t learn through copy / paste!), enjoying it so far, even if I’m slower than I’d like.

Adventures in Innovation: Inside the Rise and Fall of Nortel – John Tyson

My first job was with Nortel, so I’ve got a bit of a soft spot of the former Canadian telecom behemoth, and never felt I’d had a satisfactory explanation as to where it all went wrong. I got this book expecting a bit more insight into the fall part, but this book gave an interesting account as to the design of things I’d never put much thought into before.

The Real Internet Architecture: Past, Present, and Future Evolution – Zave, Pamela;Rexford, Jennifer; (To Read)

This came from a recommendation on Twitter, I know almost nothing about it other than that, but I’m keen to dig into this.

Burn Book: A Tech Love Story – Kara Swisher (Read)

A fun insight into the life and times of the big tech.

The 6G Manifesto – William Webb

There’s a Simpsons’ scene where Lisa is buying an Al Gore book named “Sane Planning, Sensible Tomorrow” and says “I hope it’s as exciting as his other book, ‘Rational Thinking, Reasonable Future'”.

I can’t help but feel Webb’s books are kinda like this (in a good way).

Realism is so important; staying grounded in reality is critical. Operators who go chasing fairy tales of driving higher ARPUs with wacky ideas with no business case or demand from end customers (and generally pushed by vendors, rather than operators) will struggle to remain viable in the future if they pour all their cash into things that won’t see a return, so I’m looking forward to reading some sane ideas as to how to approach the unnecessary Gs.

Flash SMS Messages

13/12/2024IMS / VoLTE, Mobile Networks, Notes, RFCs & StandardsIMS, LTE, SIP, SMS, VoIP, VoLTENick

Stumbled across these the other day, while messing around with some values on our SMSc.

Setting the Data Coding Scheme to 16 with GSM7 encoding flags the SMS as “Flash message”, which means it pops up on the screen of the phone on top of whatever the user is doing.

While reading a quality telecom blog bam! There’s the flash SMS popping over whatever I was reading.

Oddly while there’s plenty of info online about Flash SMS, it does not appear in the 3GPP specifications for SMS.

Turns out they still work, move over RCS and A2P, it’s all about Flash messages!

There’s no real secret to this other than to set the Data Coding Scheme to 16, which is GSM7 with Flash class set. That’s it.

If you’re interested in the internal machinations of how SMS works, I’ve got a few posts on the topic – You can find a list of them here.

Obviously to take advantage of this you’d need to be a network operator, or have access to the network you wish to deliver to. Recently more A2P providers are filtering non vanilla SMS traffic to filter out stuff like SMS OTA message or SIM specific messages, so there’s a good chance this may not work through A2P providers.

Background to the “VoLTE Mess”

06/12/20245G SA, IMS / VoLTE, LTE, Mobile Networks, RFCs & Standards, VoIPIMS, LTE, SIP, VoIP, VoLTENick

I’ve been writing a fair bit recently about the “VoLTE Mess” – It’s something that’s been around for a long time, mostly impacting greenfield players rolling out LTE only, but now the big carriers are starting to feel it as they shut off their 2G and 3G networks, so I figured a brief history was in order to understand how we got here.

Note: I use the terms 4G or LTE interchangeably

The Introduction of LTE

LTE (4G) is more “spectrally efficient” than the technologies that came before it. In simple terms, 1 “chunk” of spectrum will get you more speed (capacity) on LTE than the same size chunk of spectrum would on 2G or 3G.

From my post on 5G being a bit overhyped

So imagine it’s 2008 and you’re the CTO of a mobile network operator.
Your network is congested thanks to carrying more data traffic than it was ever designed for (the first iPhone had launched the year before) and the network is struggling under the weight of all this new data traffic.
You have two options here, to build more cell sites for more density (very expensive) or buy more spectrum (extremely expensive) – Both options see you going cap in hand to the finance team and asking for eye-wateringly large amounts of capital for either option.

But then the answer to your prayers arrives in the form of 3GPP’s Release 8 specification with the introduction of LTE. Now by taking some 2G or 3G spectrum, and by using it on 4G, you can get ~5x more capacity from the same spectrum. So just by changing spectrum you own from 2G or 3G to 4G, you’ve got 5x more capacity. Hallelujah!

So you go to Nortel and buy a packet core, and Alcatel and Siemens provide 4G RAN (eNodeBs) which you selectively deploy on the cell sites that are the most congested.
The finance team and the board are happy and your marketing team runs amok with claims of 4G data speeds.
You’ve dodged the crisis, phew.

This is the path that all established mobile operators took; throw LTE at the congested cell sites, to cheaply and easily free up capacity, and as the natural hardware replacement cycle kicked in, or cell sites reached capacity, swap out the hardware to kit that supports LTE in addition to the 2G and 3G tech.

Circuit Switched Fallback

But it’s hard to talk about the machinations of late 2000s telecom executives, without at least mentioning Hitler.

This video below from 15 years ago is pretty obscure and fairly technical, but the crux of it it is that Hitler is livid because LTE does not have a “CS Domain” aka circuit switched voice (the way 2G and 3G had handled voice calls).

It was optional to include support for voice calls in the LTE network (Voice over LTE) when you launched LTE services. So if you already had a 2G or 3G network (CS Network) you could just keep using 2G and 3G for your voice calls, while getting that sweet capacity relief.

So our hypothetical CTO, strapped for cash and data capacity, just didn’t bother to support VoLTE when they launched LTE – Doing so would have taken more time to launch, during which time the capacity problem would become worse, so “don’t worry about VoLTE for now” was the mantra.

All the operators who still had 2G and 3G networks, opted to just “Fallback” to using the 2G / 3G network for calling. This is called “Circuit Switched Fallback” aka CSFB.

Operators loved this as they got the capacity relief provided by shifting to 4G/LTE (more capacity in the network is always good) and could all rant about how their network was the fastest and had 4G first, this however was what could be described as a “Foot gun” – Something you can shoot yourself in the foot with in the future.

Operators eventually introduce VoLTE

Time ticked on an operators built out their 4G networks, and many in the past 10 years or so have launched VoLTE in their own networks.

For phones that support it, in areas with blanket 4G coverage, they can use VoLTE for all their calls.

But that’s the sticking point right there – If the phones support it.

But if the phones don’t support it, they’re roaming or making emergency calls, there is always been the safety blanket of 2G or 3G and Circuit Switched fallback to well, fall back to.

There’s no driver for operators who plan to (or are required to) operate a 2G or 3G network for the foreseeable future, to ensure a high level of VoLTE support in their devices.

For an operator today with 2G or 3G, Voice over LTE is still optional.
Many operators still rely exclusively on Circuit Switched Fallback, and there are only a handful of countries that have turned off 2G and 3G and rely solely on VoLTE.

VoLTE Handset Support

For the past 16 years phone manufacturers have been making LTE capable phones.

But that does not mean they’ve been making phones that support Voice over LTE.

But it’s never been an issue up until this point, as there’s always been a circuit switched (2G/3G) network to fall back to, so the fact that these chips may not support VoLTE was not a big problem.

Many of the cheaper chipsets that power phones simply don’t support VoLTE – These chips do support LTE for data connections but rely on Circuit Switched Fallback for voice calls. This is in part due to the increased complexity, but also because some of the technologies for VoLTE (like AMR) required intellectual property deals to licence to use, so would add to the component cost to manufacture, and in the chips game, keeping down component cost is critical.

Even for chips that do support Voice over LTE, it’s “special”. Unlike calling in 2G or 3G that worked the same for every operator, phone manufacturers require a “Carrier Bundle” for each operator, containing that specific operators’ special flavor of VoLTE, that operator uses in their network.

This is because while VoLTE is standardized (Despite some claims to the contrary) a lot of “optional” bits have existed, and different operators built networks with subtle differences in the “flavor” of their Voice over LTE (IMS) stack they used. The OEMs (Phone / Chip manufacturers) had to handle these changes in the devices they made, for in order to sell their phones through that operator.

This means I can have a phone from vendor X that works with VoLTE on Network Y, but does not support VoLTE on Network Z.

Worse still, knowing which phones are supported is a bit of a guessing game.

Most operators sell phones directly to their customer base, so buying an Network Y branded phone from Vendor X, you know it’s going to support Network Y’s VoLTE settings, but if you change carriers, who knows if it’ll still support it?

When you’ve still got a Circuit Switched network it’s not the end of the world, you’ll just use CSFB and probably not realize it, until operators go to shut down 2G / 3G networks…

IMS Profile selection on an engineering mode MTK based Android handset

Navigating the Maze of VoLTE Compatibility

Here are some simple checklist you can ask your elderly family members if they ask if their phone is VoLTE compatible:

Does the underlying chipset the phone is based on support VoLTE? (you can find this out by disassembling the phone and checking the datasheets for the components from the OEMs after signing NDAs for each)
Does the underlying chipset require a “carrier bundle” of settings to have been loaded for this operator in order to support VoLTE (See Qualcomm MBM as an example)?
What version of this list am I currently on (generally set in the factory) and does it support this operator? (You can check by decapping the ICs and dumping their NVRAM and then running it through a decompiler)
Does my phones OS (Android / iOS) require a “carrier bundle” of it’s own to enable VoLTE? Is my operator in the version of the database on the phone? (See Android’s Carrier Database for example) (You can find the answer by rooting the phone and running some privileged commands to poke around the internal file system)
Does my operator / MNO support VoLTE – Does my plan / package support VoLTE? (You can easily find the answer by visiting the store and asking questions that don’t appear on the script)

If you managed to answer yes to all of the above, congratulations! You have conditional VoLTE support on your phone, although you probably don’t have a working phone anymore.

Wait, conditional VoLTE support?

That’s right folks, VoLTE will work in some scenarios with your operator!

If you plan on traveling, well your phone may support VoLTE at home, but does the phone have VoLTE roaming enabled?
Many phones support VoLTE in the home network, but resort to CSFB when roaming.

If it does support VoLTE roaming, does the network you’re visiting support VoLTE roaming? Has the roaming agreement (IRA) between the operator you’re using while traveling and your home operator been updated to include VoLTE Roaming? These IRAs (AA.12 / AA.13 docs) also indicate if the network must turn off IPsec encryption for the VoLTE traffic when roaming, which is controlled by the phone anyway.

Phew, all this talk of VoLTE roaming while traveling scares me, I think I’ll stay home in the safety of the Australian bush with all these great friendly animals around a phone that supports VoLTE on my home network.

Ah – After spending some time in the Australian bush one of our many deadly animals bit me. Time to call for help! Wait, what about emergency calls over VoLTE? Again, many phones support VoLTE for normal calls, fall back to 2G or 3G for the emergency call, so if you have one of those phones (You’ll only find out if you try to make an emergency call and it fails) and try to make an emergency call in a country without 2G or 3G, you’d better find a payphone.

There’s many real world examples of this, our friends at OptimERA have been lobbying the FCC since 2019 on this.

Sarcasm aside, there’s no dataset or compatibility matrix here – No simple way to see if your phone will work for VoLTE on a given operator, even if the underlying chip does support VoLTE.

Operators in Australia which recently shut down their 3G network, were mandated to block devices that didn’t support VoLTE for emergency calling. They did this using an Equipment Identity Register, and blocking devices based on the Type Allocation Code, but this scattergun approach just blocked non-carrier issued devices, regardless of it they supported VoLTE or VoLTE emergency calling.

Blame Game

So who’s to blame here?

There’s no one group to blame here, the industry has created a shitty cycle here:

Standards orgs for having too many “flavors” available
Operators deploying their own “Flavors” of VoLTE then mandating OEMs / Chip manufacturers comply with their “flavor”.
OEMs / Chip manufactures respond by adding “Carrier Bundles” to account for this per-operator customization

I’ve got some ideas on a way to unscramble this egg, and it’s going to take a push from the industry.

If you’re in the industry and keen to push for a fix, get in touch!

It’s time to get a long term solution to this problem, and we as an industry need to lead the change.

Tales from the Trenches – IMS TCP Socket Handling

28/11/2024IMS / VoLTE, Mobile Networks, RFCs & Standards, VoIPIMS, LTE, SIP, VoIP, VoLTENick

Oh boy this has been a pain in the backside with IMS / VoLTE devices using TCP and how they handle the underlying TCP sockets.

A mobile phone from manufacturer A, wants every SIP dialog to be in it’s own TCP session, while a phone from manufacturer B wants a unique TCP session per transaction, while manufacturer C thinks that every SIP message should reuse the same transaction.

So an MT call to manufacturer A, who wants every SIP dialog in it’s own transaction would look something like this:

PCSCF:44738 -> UE:5060; TCP SYN
UE:5060 -> PCSCF:44738; TCP SYN/ACK
PCSCF:44738 -> UE:5060; TCP ACK
--- TCP connection is now open to UE from P-CSCF---
--- Start of new SIP Transaction 1 & Dialog ---
PCSCF:44738 -> UE:5060; TCP PSH - SIP INVITE....
UE:5060 -> PCSCF:44738; TCP ACK


UE:5060 -> PCSCF:44738; TCP PSH - SIP 200....
PCSCF:44738 -> UE:5060; TCP ACK, PSH - SIP ACK....
UE:5060 -> PCSCF:44738; TCP ACK
--- End of SIP Transaction 1 ---

--- Start of SIP Transaction 2 ---
PCSCF:44738 -> UE:5060; TCP PSH - SIP BYE....
UE:5060 -> PCSCF:44738; TCP ACK, PSH - SIP 200....
--- End of SIP Transaction 2 & SIP Dialog ---
PCSCF:44738 -> UE:5060; TCP FIN
UE:5060 -> PCSCF:44738; TCP ACK
--- End of TCP Connection ---

Where UE:5060 – is the IP & port of the UE, as advertised in the Contact: header, while PCSCF:44738 is the PCSCF IP and a random TCP port used for this connection.

But for manufacturer B, who wants a unique TCP session per transaction, they want it to look like this:

PCSCF:44738 -> UE:5060; TCP SYN
UE:5060 -> PCSCF:44738; TCP SYN/ACK
PCSCF:44738 -> UE:5060; TCP ACK
--- TCP connection is now open to UE from P-CSCF---
--- Start of new SIP Transaction 1 & Dialog ---
PCSCF:44738 -> UE:5060; TCP PSH - SIP INVITE....
UE:5060 -> PCSCF:44738; TCP ACK


UE:5060 -> PCSCF:44738; TCP PSH - SIP 200....
PCSCF:44738 -> UE:5060; TCP ACK, PSH - SIP ACK....
UE:5060 -> PCSCF:44738; TCP ACK
PCSCF:44738 -> UE:5060; TCP FIN
UE:5060 -> PCSCF:44738; TCP ACK
--- End of SIP Transaction 1 & TCP Session 1 ---

--- Start of TCP Session 2 ----
PCSCF:32627 -> UE:5060; TCP SYN
UE:5060 -> PCSCF:32627; TCP SYN/ACK
PCSCF:32627 -> UE:5060; TCP ACK
--- Start of SIP Transaction 2 ---
PCSCF:32627 -> UE:5060; TCP PSH - SIP BYE....
UE:5060 -> PCSCF:32627; TCP ACK, PSH - SIP 200....
--- End of SIP Transaction 2 & SIP Dialog ---
PCSCF:32627 -> UE:5060; TCP FIN
UE:5060 -> PCSCF:32627; TCP ACK
--- End of TCP Connection 2 ---

And then manufacturer C wants just the one TCP session to be used for everything, so they open the TCP connection when they register, and that’s all we use for everything.

Is there any logic to this? Nope, seems to be tied to the underlying chipset (Qualcomm vs Mediatek vs Unisoc) and the SIP stack used (Qualcomm, MTK, Unisoc, Samsung, Apple).

We’ve profiled devices into one of 3 behaviors, and then we tag them based on user agent as to what “persona” they demand from the network.

I can’t believe I’m still talking about VoLTE / IMS handset support and it’s almost 2025…. For context IMS was “standardized” 17 years ago.

Mobile Network Code – 2 or 3 Digits?

08/11/2024Mobile Networks, Notes, SDM, SIM CardsLTE, SDM, SIM CardNick

Every mobile network broadcasts a Public Mobile Network Code – aka a PLMN. This 6 octet value is used to identify the network (Although this gets murky with shared codes and Private networks and when OEMs make them for codes they don’t own).

It’s made up of a Mobile Country Code followed by a Mobile Network code.

One of the guys at work asked a seemingly simple question, is the PLMN with MCC 505 and MNC 57 the same as MCC 505 MNC 057 – It’s on 6 octets after all.

So is Mobile Network Code 57 the same as Mobile Network Code 057 in the PLMN code?

The answer is no, and it’s a massive pain in the butt.

All countries use 3 digit Mobile Country Codes, so Australia, is 505. That part is easy.

The tricky part is that some countries (Like Australia) use 2 digit Mobile Network Codes, while others (Like the US) use 3 digit mobile network codes.

This means our 6 digit PLMN has to get padded when encoding a 2 digit Mobile Network Code, which is a pain, but also that by looking at the IMSI alone, you don’t know if the PLMN is 2 digit or 3 digit – IMSI 5055710000001 could be parsed as MCC 505, MNC 571 or MCC 505, MNC 57.

Why would you do this? Why would a regulator opt to have 1/10th the addressable size of network codes – I don’t know, and I haven’t been able to find an answer – If you know please drop a comment, I’d love to know.

So how do we handle this?

There are files in the SIM profile to indicate the length of the MNC, the Administrative Domain EF on the SIM allow us to indicate if the MNC is 2 digit or 3 digit, and the HPLMNwAct and the other *PLMN* EFs encode the PLMN as 6 digit, with or without padding, to allow differentiation.

That’s all well and good from a SIM perspective, but less useful for scenarios where you might be the Visited PLMN for example, and only see the IMSI of a Subscriber.

We worked on a project in a country that mixed both 2 digit and 3 digit Mobile Network Codes, under the same Mobile Country Code. Certain Qualcomm phones would do very very strange things, and it took us a long time and a lot of SIM OTA to resolve the issue, but that’s a story for another day…

private_data_dir on Ansible Runners called from Python

01/11/2024Linux, NotesAnsible, PythonNick

We’ve got a web based front end in our CRM which triggers Ansible Playbooks for provisioning of customer services, this works really well, except I’d been facing a totally baffling issue of late.

Ansible Plays (Provisioning jobs) would have variables set that they inherited from other Ansible Plays, essentially if I set a variable in Play X and then ran Play Y, the variable would still be set.

I thought this was an issue with our database caching the play data showing the results from a previous play, that wasn’t the case.

Then I thought our API that triggers this might be passing extra variables in that it had cached, wasn’t the case.

In the end I ran the Ansible function call in it’s simplest possible form, with no API, no database, nothing but plain vanilla Ansible called from Python

    # Run the actual playbook
    r = ansible_runner.run(
        private_data_dir='/tmp/',
        playbook=playbook_path,
        extravars=extra_vars,
        inventory=inventory_path,
        event_handler=event_handler_obj.event_handler,
        quiet=False
    )

And I still I had the same issue of variables being inherited.

So what was the issue? Well the title gives it away, the private_data_dir parameter creates a folder in that directory, called env/extravars which a JSON file lives with all the vars from all the provisioning runs.

Removing the parameter from the Python function call resolved my issue, but did not give me back the half a day I spent debugging this…

string_indexed_fields in the config file