X-Polarised Antennas in Mobile Networks

15/08/20255G SA, EUTRAN, GSM, Mobile Networks, RF5G, Polarization, RFNick

Polarization Basics

Let’s imagine the coin slot on a payphone – Coins can only enter the slot if they’re aligned with the slot.

If you tried to rotate the coin by 90 degrees, it wouldn’t fit it in the slot.

If the slot on the payphone went from up-to-down, our coin slot could be described as “vertically polarized”.
Only coins in the vertically polarized orientation would fit.

Likewise, a payphone with the coin slot going side-to-side we could describe the coin slot as being “horizontally polarized”, meaning only coins that are horizontally polarized (on their side) would fit into the coin slot.

RF waves also have a polarization, like our coin slot.

A receiver wishing to receive into signals transmitted from a vertically-polarized antenna, will need to use a vertically-polarised antenna to pick up the signal.

Likewise a signal transmitted from a horizontally polarized antenna, would require a horizontally polarised antenna on the receiving side.

If there is a mismatch in polarization (for example RF waves transmitted from a horizontal polarized antenna but the receiver is using a vertically polarized antenna) the signal may still get through, but received signal strength would be severely degraded – in the order of 20dB, which is 1/100th of the power you’d get with the correct polarization.

You can think of polarization mismatches as like cutting up the coin to fit sideways through the coin slot – you’d get a sliver of the original coin that was cut up to fit. Much like you recieve a fraction of the original signal if your polarization doesn’t match on both ends.

different antenna polarizations — Plagiarised diagram showing antenna polarization

Useless Information: In Australia country TV stations and metro TV stations sometimes transmitted different programming. To differentiate the signals on the receiver side, country TV transmitters used vertical polarisation, while metro transmitters used horizontal polarization.
The use of different polarization orientation cuts down on interference in the border areas that sit in the footprint of the metro and country transmitters.
This means as you drive through metro areas you’ll see all the yagi-antennas are horizontally oriented, while in country areas, they’re vertically oriented.

Vertical Polarization

Early mobile phone networks used Vertical Polarization.

This means they used an flagpole like antenna that is vertically oriented (Omnidirectional antenna) on the base-station sites.

Oldschool mobile phones also had a little pop out omnidirectional antenna, which when you held the phone to your ear, would orient the antenna vertically.

This matches with the antenna on the base station, and away we go. You still sometimes see vertical polarization in use on base-station sites in low density areas, or small cells.

Vertically polarized mobile phone antenna, which is oriented vertically, like on the base station behind it.

Increasing subscriber demand meant that operators needed more capacity in the network, but spectrum is expensive. As we just saw a mismatch in polarization can lead to a huge reduction in power, and maybe we can use that to our advantage…

Shannon-Hartley Theorem

But first, we need to do some maths…

Stick with me, this won’t be that hard to understand I promise.

There are two factors that influence the capacity of a network, the Bandwidth of the Channel and the Signal-to-Noise Ratio.

So let’s look at what each of these terms mean.

Bandwidth

Bandwidth is the information carrying capacity.
A one-page sheet of A4 at 12 point font, has a set bandwidth. There’s only so much text you can fit on one A4 sheet at that font size.

We can increase the bandwidth two ways:

Option 1 to Increase Bandwidth: Get a larger transmission medium.
Changing the size of the medium we’re working with, we can increase how much data we can transfer.

For this example we could get a bigger sheet of paper, for example an A3 sheet, or a billboard, will give us a lot more bandwidth (content carrying capability) than our sheet of A4.

Changing from an A4 sheet to an A3 sheet, increases the number of characters we can store on the page (Slightly more than doubling the bandwidth).

Option 2 to Increase Bandwidth: Use more efficient encoding
As well as changing the size of the medium we are using, we can change how we store the data on the paper, for example, shrinking the font size to get more text in the same area, which also the bandwidth.

In communications networks this is also true: Bandwidth is determined by how much spectrum we have to work with (For example 10Mhz), and how we encode the data on that spectrum, ie morse-code, Binary-Phase-Shift-Keying or 16-QAM.
Each of the different encoding schemes have different levels of bandwidth for the same amount of spectrum used, and we’ll cover those in more detail in the future.

So now we’ve covered increasing the bandwidth, now let’s talk about the other factor:

Signal-to-Noise Ratio

Signal-to-Noise Ratio (SNR) is the ratio of good signal, to the background noise.

On the train my headphones on block out most of the other sounds.
In this scenario, the signal (the podcast I’m listening to on the headphones) is quite high, compared to the noise (unwanted sounds of other people on the train), so I have a good Signal-to-Noise ratio.

When we talk about the Signal-to-Noise Ratio, we’re talking about the ratio of the signal we want (podcast) to the noise (signal we don’t want).

When I’m on the train if 90% of what I hear is the podcast I’m listening to (the “signal”) and 10% is random background sounds (the “noise”) then my signal-to-noise-ratio is really good (high).

Capacity and SNR

Let’s continue with the listening to a podcast analogy.

The average human talks about 150 words per minute. So let’s imagine I’m listening to a podcast at 150 words per minute.

If I’m listening in an anechoic chamber, then I’ll be able to hear everything that’s being said, so my bandwidth will 150 words per minute. As there is no background noise, my capacity will also be 150 words per minute.

But if I leave an anechoic chamber (much as I love spending time in anechoic chambers), and go back on the train, I won’t hear the full 150 words per minute (bandwidth) due to the noise on the train drowning out some of the signal (podcast).

The Shannon-Hartley Theorem, states that the capacity is equal to the bandwidth multiplied by the signal to noise ratio.

So on the train hearing 90% of what’s said on the podcast, 10% drowned out, means my signal-to-noise ratio is 0.9 (pretty good).

So according to Shannon-Hartley Theorem the capacity of me listening to a podcast on the train (150 words per minute of bandwidth multiplied by 0.9 Signal-to-Noise Ratio) would give me 135 words per minute of capacity.

Claude Shannon: Juggling Unicyclist Who Changed the World | Time — Claude Shannon, of 1/2 of the Shannon-Hartley Theorem, with an electromechanical mouse maze.

How this applies to RF Networks

In an RF context, our Bandwidth has a fixed information carrying capacity, for example on LTE, with a 5Mhz wide channel using 16QAM has 12.5Mbps of bandwidth available.

In a simple system, we have two levers we can pull to increase the bandwidth:

Increasing the size of the channel – If we went from a 5Mhz wide channel to a 20Mhz channel, this would give us 4x the available Bandwidth (Actually slightly more in LTE, but whatever)
Changing the encoding to cram more data on the same a size channel (From 16QAM to 64QAM would also give us 4x the available Bandwidth).

As we’ll see later in this post, there are some extra tricks (MIMO and Diversity) that we’ll look at later in this post, to increase the bandwidth of the system.

Our Signal-To-Noise (SNR) is constantly variable with a gazillion things that can influence the result.
Some of the key factors that impact the SNR are the distance from the transmitter to the receiver and anything blocking the path between them (trees, buildings, mountains, etc), but there’s so many other factors that go into this. From atmospheric conditions, flat surfaces the signal can reflect off leading to multipath noise, other nearby transmitters, etc, can all influence our SNR.

Our capacity is equal to our Bandwidth multiplied by the Signal-to-Noise ratio.
Shannon-Hartley Theorem (ish)

As a goal we want capacity, and in an ideal world, our capacity would be equal to our bandwidth, but all that noise sneaks in and reduces our available capacity, based on the current SNR value.

So now we want to get more capacity out of the network, because everyone always wants to add capacity to networks.

One trick that we can use it to use multiple antennas with different polarization.

If our transmitter sends the same signal data out multiple antennas, with some clever processing on the transmitter and the receiver, we use this to maximize the received SNR. This is called Transmit Diversity and Receive Diversity and it’s a form of black magic.

The Transmitter uses feedback from the receiver to determine what the channel conditions are like, and then before transmitting the next block of data, compensates for the channel conditions experienced by the receiver, this increases the SNR and allows for higher MCS / encoding schemes, which in turns means higher throughput.

You’ll notice on most Antennas in the wild today you’ve got at least two ports for each frequency, which are + and -, which are the two polarizations.

Modern mobile networks use ±45° slant polarization (aka X Polarization), which works better in the orientations end users hold their phones in.

These two polarizations, each connected to a distinct transmit/receive path on the phone (UE) end and on the base station end, allows multiple data streams to be sent at the same time (spatial multiplexing, the foundation for MIMO) which enables higher throughput or can be configured enable redundancy in the transmission to better pick up weak signals (Diversity).

SessionS in CGrateS

08/08/2025CGrateS, Mobile Networks, Software, VoIPCGrates, OCS, Online Charging, SessionSNick

In a scenario where we don’t know how long an event will be (for example at the start of a voice call, we don’t know how long it’s going to go for, or the start of a data session but we don’t know how much data will be used) but need to:
A) charge for it and
B) apply some credit control to make sure the subscriber doesn’t consume more than their allowed balance

That’s when we use CGrateS’ SessionS module.

SessionS is what powers online charging, and it’s done with Unit Reservation, I’ve written about this in painful detail here.

For a voice call for example, we reserve talk time in advance, before the user actually consumes it, for example when the call starts, we reserve 30 seconds of credit from the user’s balance, then when the user has consumed this first 30 seconds of credit, we go back and request another 30 seconds of credit.
If there’s credit available, we grant it and the call is allowed to continue for another 30 seconds, and then the process repeats, until either the call ends or we go back for more credit and there’s none available, at which point we terminate the call.

Why is this important?
We may have multiple sources drawing down on an account at the same time, if you’re on a call while browsing, you’re doing two events that are charged, and may be charged from the same balance, and we don’t want to give you free calls or data just because you’re able to walk and chew gum at the same time.

CGrateS Agents such as Asterisk, Kamailio, FreeSWITCH, RADIUS and Diameter Agents handle most of the heavy lifting for us, but understanding how SessionS works for me at least, made working with these modules much easier.

So let’s set the scene, we’re going to create an Account with 10 units of *generic balance (I’m using generic as if we use time the numbers end up kinda big and it gets confusing to look at) and then consume over several transactions it until all the balance is gone

In the config we’ve disabled the debit_interval in session – Usually this is handled by the Agents, but for our demo we’re going to do it manually, so it’s off.

Let’s get setup, we’ll define a charger, and create an account and allocate some balance to it.

#Define default Charger
print(CGRateS_Obj.SendData({
    "method": "APIerSv1.SetChargerProfile",
    "params": [
        {
            "Tenant": "cgrates.org",
            "ID": "Charger_API_Default",
            "RunID": "*Charger_API_Default_RunID",  #Arbitrary Sting
            'FilterIDs': [],
            'AttributeIDs': ['*none'],
            'Weight': 999,
        }
    ]   }   ))  

#Add a balance to the account with type *generic with 10 units of balance
Create_Voice_Balance_JSON = {
    "method": "ApierV1.SetBalance",
    "params": [
        {
            "Tenant": "cgrates.org",
            "Account": "Nick_Test_123",
            "BalanceType": "*generic",
            "Categories": "*any",
            "Balance": {
                "ID": "10_units_generic_balance",
                "Value": "10",
                "Weight": 25,
                "Blocker": "true",       #This stops the Monetary Balance from being used
            }
        }
    ]
}
print(CGRateS_Obj.SendData(Create_Voice_Balance_JSON))

Alright, with that out of the way let’s start a session using SessionSv1.UpdateSession we’re going to define a CGrateS event to pass to it, and we’ll call it multiple times, but change the usage as we go.

To make our demo easier, I’ve nested a little for loop, so we can keep deducting balance,

now = datetime.datetime.now()
OriginID = str(uuid.uuid4())
call_event = {
                "RequestType": "*prepaid",
                "ToR": "*generic",
                "Tenant": "cgrates.org",
                "Account": "Nick_Test_123",
                "AnswerTime": "*now",
                "OriginID": str(uuid.uuid1()),
                "OriginHost": "ScratchPad",
            }

while input("Enter to continue or q to quit") != "q":
    call_event['Usage'] = str(input("Usage: "))
    result = CGRateS_Obj.SendData(
        {"method": "SessionSv1.UpdateSession", "params": [
            {
                "GetAttributes": False,
                "UpdateSession": True,
                "Subsystem" : "sessions",
                "Tenant": "cgrates.org",
                "ID": OriginID,
                "Context": None,
                "Time": datetime.datetime.now().strftime("%Y-%m-%dT%H:%M:%S.%fZ"),
                "Event":
                    call_event}
        ]})
    pprint.pprint(result)
print("Quit")

So now with this all in place, we define the default charger add add balance to an account (as the account doesn’t exist yet, this step creates the account too) in the first block of code, and this second block of code defines the event.

By running these together, we can start our session.

When you run it you’ll be prompted to press enter to continue or input q to quit, let’s enter to continue, then you’ll be asked for the usage, I’ve put 1 in the below example.

Enter to continue or q to quit
Usage: 1
Sending Request with Body:
{'method': 'SessionSv1.UpdateSession', 'params': [{'GetAttributes': False, 'UpdateSession': True, 'Subsystem': 'sessions', 'Tenant': 'cgrates.org', 'ID': '8e43c5e4-0b9b-4aaf-8d01-5143677d6a8a', 'Context': None, 'Time': '2024-06-14T22:22:52.279155Z', 'Event': {'RequestType': '*prepaid', 'ToR': '*generic', 'Tenant': 'cgrates.org', 'Account': 'Nick_Test_123', 'AnswerTime': '*now', 'OriginID': 'c86e7f54-2a48-11ef-9862-072e6d04df9b', 'OriginHost': 'ScratchPad', 'Usage': '1'}}]}
{'error': None, 'id': None, 'result': {'MaxUsage': 1}}

Alright, now let’s take a quick sidebar, and check in with cgr-console in a different tab, what do we think is going to show as our balance?

Well, if we run the accounts command from within cgr-console we can see our account which had a balance of 10 before, now has a balance of 9, as we’ve deducted 1 from the balance by inputting it as our usage:

And if we run the active_sessions command in the same console, we see the active sessions, where we can see where that one unit of balance went.

A few things to call out here:

The DebitInterval is how often this balance will be deducted, for our test scenario we’ve turned off automatic debiting, but Agents like FreeSWITCH and Kamailio leave this on and automatically tick off time as it passes (Obliviously this doesn’t work for data, so we’d leave it off)
The LoopIndex is how many UpdateSessions events the API has handled for this session (the unique session is identified by the ID / CGRID field)
SetupTime is blank because we didn’t set it in our initial UpdateSession API Call
The Usage in cgr-console is sometimes shown as nanoseconds, that’s because 1ns is equal to 1 generic unit.

So let’s go back to our Python script, go through the loop again but this time set the usage to 7.

Now if we flip back to cgr-console and check again, we’ll see, as expected that our account balance is now 2, and the active session has 8 of usage.

That’s because we started with 10, then we deducted 1, then we deducted 7, gives us 2 remaining. If we’re to run active_sessions again at cgr-console we’ll see the Usage of the session is now 8.

And lastly let’s try and take another 7 of balance, knowing we’ve only got 2 units left.

No dice; 7 is greater than 2 of course, so CGrateS stops us there by rejecting the request with RALS_ERROR:INSUFFICNET_CREDIT_BALANCE_BLOCKER – SessionS has done it’s job of making sure we didn’t allocate more of the credit than we were allowed and told us we have insufficient credit and that this balance is a blocker.

In this little demo we had one service drawing on the same source, but imagine if you’d fired up two copies of the script, you could have those two sources both consuming data at the same time, and this is where CGrateS shines; CGrateS can do all the heavy lifting to make sure that the resources are never over allocated, and that we’re not ending up with a negative balance.

When it comes time to terminating the session, there’s a trick to this.

Unit reservation is all about allocating resources in advance, this means we’ve generally have taken more money from the balance than we actually ended up consuming, so we have to give this back to the customer.

If we include the Usage field in the TerminateSession request, this must be the total usage for the entire session (start-to-finish), not just since the last UpdateSession API call.

For example if we allocated 30 seconds balance at the start of a call, then as that 30 seconds was consumed, we allocated another 30 seconds, and then when the call got 60 seconds in, we allocate another 30 seconds of balance. But if the call ends at a total of 70 seconds, we’ve allocated 90 seconds (3x 30 seconds), so we’d be over billing the customer. This is where we set Usage to 70 and CGrateS will refund the 20 seconds of balance we over charged them. This is because 3x 30 seconds = 90 seconds allocated, but the call only ended up using 70 seconds, so we need to refund 20 seconds of balance (90 – 70 = 20) to the Account balance.

That’s one way of doing it, but the other option is if we’ve just tracked usage since the last update, we have a 70 second call that we had allocated 3x 30 seconds Session Updates, we can set LastUsed to be 10 seconds (as we only used 10 seconds of the 30 seconds allocated in the last Update) which will also refund the 20 seconds.

In practice, you’ll probably use CGrateS Agents like the FreeSWITCH Agent, Asterisk Agent or Kamailio Agent to handle the charging in those applications. By using the premade CGrateS Agents, it handles generating the UpdateSession calls and all of this logic under the hood, but it’s super useful to know how it all works.

I’ve put the example cgrates.json file I used and the script for debiting on the Github repo for this post.

Other posts in the CGrateS in Baby Steps series:

Importing building footprints into Forsk Atoll from OpenStreetMap data

01/08/20255G SA, EUTRAN, GSM, LTE, Mobile Networks, RF, SoftwareAtoll, Forsk, RF, RF PlanningNick

Having building footprints inside Atoll is super-duper valuable, this means you can calculate your percentage of homes / buildings covered, after all geographic coverage and population coverage are two very different things.

Download the data from OSM data – If you only need a small are you can use the Export OSM page, or if you need a wider area Geofabrik provides country level exports of the data, or if you’re really keen you can download all the OSM data.

Once you’ve got the export, we’ll load the .gpkg file (or files) into GlobalMapper

Select one layer at a time that you want to export into Atoll. (This also works for roads, geographic boundaries, POIs, etc)

Export the selected layer from Export -> Export Vector / Lidar Format

Set output type to “Shapefile”

Set output filename in “Export Areas” (This will be the output file). If you want to limit the export to a given area you can do that in Export Bounds.

Now we can import this data into Atoll.

File -> Import

Select the exported Shapefile we just created.

Set the projection and import

Bingo now we’ve got our building footprints,

We can change the style of the layer and the labels as needed.

Now we can use the buildings as the Focus Zone / Compute Zone and then run reports and predictions based on those areas.

For example I can run Automatic Cell Planning with the building layers as the Focus zones, to optimize azimuths, tilts and powers to provide coverage to where people live, not just vacant land.

Proxmox Insecure dependency in exec while running with -T switch

25/07/2025Notes, SecurityProxmoxNick

Disclaimer: This is in a lab held together with duct tape and glue that doesn’t do anything important. Use at your own risk.

I hit this issue in Proxmox whenever trying to do anything with VMs:

TASK ERROR: clone failed: lvcreate ‘SDH/vm-202-disk-0’ error: Insecure dependency in exec while running with -T switch at /usr/lib/x86_64-linux-gnu/perl-base/IPC/Open3.pm line 176.

I’m not a Perl user, but Google tells me that it’s because Perl is set to operate in Taint mode thanks to the -T parameter.

The file it’s referencing doesn’t actually contain the -T parameter, so I went looking and sure enough a bunch of the Proxmox services contain the -T parameter.

Ah access to source makes debugging so much easier!

So a hacky one liner to remove the -T in those files:

sed -i 's|#!/usr/bin/perl -T|#!/usr/bin/perl|' /usr/bin/pveam /usr/bin/pveceph /usr/bin/pveproxy /usr/bin/spiceproxy /usr/bin/pvesr /usr/bin/vzdump /usr/bin/pvenode /usr/bin/pvedaemon

And a restart the services:

systemctl restart pveproxy pvedaemon

And presto, whatever important security feature and flag that never should be turned off is gone, and my Frankenstein lab lives to operate another day.

Profiling Kamailio Module Memory Usage

18/07/2025Kamailio, Notes, Python, VoIPKamailio, Prometheus, SIP, VoIPNick

Recently we were debugging a slow memory leak in a little-used Kamailio module.

For a long time we’ve used the Kamailio Prometheus exporter to export stats from Kamailio and then Grafana to visualize the metrics.

And one of the things we track per box is the shared memory (shm) usage inside Kamailio:

(kamailio_shmem_real_used_size / kamailio_shmem_total_size) * 100

Our dashboards showed this slow memory leak, with the memory slowly ticking up on a certain group of machines, we knew there was a memory leak, but in which Kamailio module?

Shared memory usage per Kamailio instance

So we know there’s a memory leak on some boxes (different boxes run different modules) but which ones?

kamcmd will show you a breakdown of the shared memory usage per module, so I started with a little cronjob to dump the data every so often, with the idea I would diff the values and look for big increases.

*/5 * * * * kamcmd mod.stats all shm > /home/nickj/memory_stats_$(date +\%Y_\%m_\%d_\%H_\%M).log

Then the idea of processing the data from the files started to scare me.

So in the end I went overboard and wrote a little script to get the memory usage of each module and export it via Prometheus, so I can track down what module is at fault.

If I were a better C programmer I’d have added this into the main Kamailio Prometheus module, but I’m terrible at C.

I’ve put the code on Github here.

With it I was able to setup this:

Yeap, that’s our problem, here’s the deltas, that module has grown by ~10Mb in 24 hours.

Now to fix the problem…

If anyone else finds this useful I’ve also posted the source for the Grafana Dashboard here.

Importing Global Clutter data into Forsk Atoll

11/07/20255G SA, EUTRAN, GSM, LTE, Mobile Networks, Notes, RF, SoftwareAtoll, Forsk, RF, RF PlanningNick

Clutter data describes real world things on the planet’s surface that attenuate signals, for example trees, shrubs, buildings, bodies of water, etc, etc. There’s also different types of trees, some types of trees attenuate signals more than others, different types of buildings are the same.

Getting clutter data used to be crazy expensive, and done on a per country or even per region basis, until the European Space Agency dropped a global dataset free of charge for anyone to use, that covered the entire planet in a single source of data.

So we can use this inside Forsk Atoll for making our predictions.

First things first we’ll need to create an account with the ESA (This is not where they take astronaut applications unfortunately, it just gives you access to the datasets).

Then you can select the areas (tiles) you want to download after clicking the “Download” tab on the right.

We get a confirmation of the tiles we’re download and we’ll get a ZIP file containing the data.

We can load the whole ZIP file (Without needing to extract anything) into GlobalMapper which loads all the layers.

I found the _Map.tif files the highest resolution, so I’m only exporting these.

Then we need to export the data to GeoTiff for use in Atoll (The specific GeoTiff format ESA produces them in is not compatible with Atoll hence the need to convert), so we export the layers as Raster / Image format.

Atoll requires square pixels, and we need them in meters, so we select “Calculate Spacing in Other Units”.

Then set the spacing to meters (I use 1m to match everything else, but the data is actually only 10m accurate, so you could set this to 10m).

You probably want to set the Export Bounds to just the areas you’re interested in, otherwise the data gets really big, really quickly and takes forever to crunch.

Now for the fancy part, we need to import it into Atoll.

When we import the data we import it as Raster data (Clutter Classes) with a pixel size of 1m.

Alas when we exported the data we’ve lost the positioning information, so while we’ve got the clutter data, it’s just there somewhere on the planet, which with the planet being the size it is, is probably not where you need it.

So I cheat, I start put putting the West and North values to match the values from a Cell Site I’ve already got on the map (I put one in the top left and bottom right corners of the map) and use that as the initial value.

Then – and stick with me, this is very technical – I mess with the values until the maps line up into the correct position. Increase X, decrease Y, dialing it it in until the clutter map lines up with the other maps I’ve got.

Right, now we’ve got the data but we don’t have any values.

Each color represents a clutter class, but we haven’t set any actual height or losses for that material.

To know what each colour means we need to RTFM – ESA WorldCover 2020 User Manual.

Which has a table:

Alas the Map Code does not match with the table in the manual, but the colours do, here’s what mine map to:

Which means when hovering over a layer of clutter I can see the type:

Next we need to populate the heights, indoor and outdoor losses for that given clutter. This is a little more tricky as it’s going to vary geography to geography, but there’s indicative loss numbers available online pretty easily.

Once you’ve got that plugged in you can run your predictions and off you go!

Fluke TN2100 Terminator

04/07/2025Notes990DSL, Fluke, TN2100Nick

Because I’m a sucker for accessories for tools, I bought a Fluke TN2100 “Enhanced Terminator Far End Device”.

This device was made to go with the Fluke CopperPro Series II (Aka Fluke 990DSL), and is required for doing certain qualifications on the line, such as profiling the behavior of the line at different frequencies, wideband noise levels, etc.

This is all to qualify the quality of a loop / pair (or pairs – Can do up to 2 pairs / 4 cables) as being capable of supporting a variety of services like A/V/DSL as well as E1s / ISDN services, EMF, etc, etc.

It’s a super useful feature, if your loop is rubbish, putting an E1 circuit on it will end up with a high bit-error-rate, and putting a DSL circuit on it will see it constantly loose sync, so being able to qualify if the loop is any good beforehand is a must.

Alas the unit I’ve got came from eBay, and didn’t come with the leads, there’s just an RJ45 port on the terminator, with no indication as to what’s what.

So here’s the answer (which isn’t in any manual I could find) which I found through trial-and-error and cracking the case open to work out what was what:

RJ45 Pin	Function	Colour (T-658A)
1	Blue Pair 1 (A-Leg)	White-Green
2	Blue Pair 1 (B-Leg)	Green
3	Yellow Pair 1 (A-Leg for Exchange Passthrough)	White-Orange
4	Yellow Pair 1 (B-Leg for Exchange Passthrough)	Blue
5	Red Pair 2 (A-Leg)	White-Blue
6	Red Pair 2 (B-Leg)	Orange
7	Unused	Brown-White
8	Earth	Brown

Hopefully of some use to some one.

Legacy BTS Site manager on Linux

27/06/2025EUTRAN, LTE, Mobile Networks, Notes, SoftwareBTS Site Manager, EUTRAN, Java, LTE, Nokia, RANNick

Another post in the “vendors thought Java would last forever but the web would just a fad” series, this one on getting Nokia BTS Site Manager (which is used to administer the pre-Airscale Nokia base stations) running on a modern Linux distro.

For starters we get the installers (you’ll need to get these from Nokia), and install openjdk-8-jre using whichever package manager your distro supports.

Once that’s installed, then extract the installer folder (Like BTS Site Manager FL18_BTSSM_0000_000434_000000-20250323T000206Z-001.zip).

Inside the extracted folder we’ve got a path like:

BTS Site Manager FL18_BTSSM_0000_000434_000000-20250323T000206Z-001/BTS Site Manager FL18_BTSSM_0000_000434_000000/C_Element/SE_UICA/Setup

The Setup folder contains a bunch of binaries.

We make these executable:

chmod +x BTSSiteEM-FL18-0000_000434_000000*

Then run the binary:

sudo ./BTSSiteEM-FL18-0000_000434_000000_x64.bin

By default it installs to /opt/Nokia/Managers/BTS\ Site/BTS\ Site\ Manager

And we’re done. Your OS may or may not have built a link to the app in your “start menu” / launcher.

You can use one BTS manager to manage several different versions of software, but you need the definitions for those software loaded.

If you want to load the Releases for other versions (Like other FLF or FL releases) the simplest way is just to install the BTS site manager for those versions and just use the latest, then you’ll get the table of installed versions in the “About” section that you can administer.

Kamailio Bytes: KEMI and UAC Module – Event Route

20/06/2025IMS / VoLTE, Kamailio, Mobile Networks, Notes, Python, VoIPKamailio, Kamailio Bytes, KEMI, Python, UACNick

The UAC module is super handy for creating and sending SIP requests from Kamailio. It could be triggered via HTTP requests using xHTTP, other SIP messages or on a scheduled basis like with Rtimer.

More and more I’ve been using KEMI to allow me to write Python based Kamailio dialplans, do to all sorts of funky stuff.

The UAC module can handle the replies to requests it originated, it’s handled through the event route blocks in the native Kamailio diaplan with

event_route[uac:reply] {}

But that doesn’t exist inside KEMI.

But inside our kamailio.cfg we can specify the event callback route:

loadmodule "uac"
modparam("uac", "event_callback", "ksr_uac_event")

Then in our Kemi code (mine is in Python) we can pick it up with:

    def ksr_uac_event(self, msg, evname):
        KSR.info("===== uac module triggered event: " + evname + "\n")
        return 1

And that’s it!

Reverse Engineering dead protocols – Strand ShowNet

13/06/2025History, Notes, Python, SoftwareShowNet, StrandNick

As a kid I did lighting for stage shows.

Turns out it was great training for a career in telecom; I learned basic rigging, working at heights, electrical work, patching rats nests of cables and the shared camaraderie that comes from having stayed up all night working on something no one would ever notice, unless you hadn’t done it.

At the time the Strand 520i lighting console was the coolest thing ever, it had support for 4 DMX universes (2048 channels – Who could ever need more than that!?) and cost more than a new car.

One late Friday night browsing online I found one for sale by a University in the state I live in, for $100 or best offer. You better believe I smashed the buy now button so hard my mouse almost broke.
I was going to own my very own Strand 520.

I spent the weekend reading through the old manuals, remembering how to use all the features, then dragged my partner for the road trip the following Monday morning to pick it up and bring it home.

But before I could do anything fun I had to find a PS/2 keyboard and a VGA screen, which took me a few more days (a visit to the tip was the easiest way to get a hold of them).

Then I needed something to receive DMX – I found everything now uses ArtNet (DMX over IP) and there’s visualisers for simulating an arena / stage lighting setup, but all take ArtNet now, so I ordered a DMX to ArtNet converter.

Inside the unit is pretty much a standard PC (An “OG” Pentium in the Strand 520) with an ISA card for all the lighting control stuff.

The clicking hard drive managed to boot, but I didn’t think it’d last long, having been made more than 25 years prior. So I created a disk image and copied the file system onto a CF card using a CF to IDE adapter. This trick meant it booted faster than ever before.

One thing I’d read about online was the VARTA battery had a tendency to leak battery acid all over the PCB.
This one had yet to fully spill its guts, but was looking a bit bulgey and had started to leak a little already.

The battery (I’d read) was only for storing info in a power loss scenario, and if the battery isn’t present it just slows the boot time as everything has to be read from disk, so I took the leap of faith and cut the battery out, and lo, it still all boots.

So now I was OK to get the desk online properly, but was getting semi-regular lock ups where DMX would stop outputting and inputs on the console were not read, but the underlying PC was still working.

I spent a lot time debugging this. BIOS settings, interrupts, I dived into how ISA works, replaced the battery I removed with a brand new one, and at one point I broke out the oscilloscope, but nothing worked.

Around the same time I noticed the Ethernet port (BNC!) would work if I just ran plain DOS (could ping from DOS but when the application started the NIC would go dead), which made me think I may be facing a hardware fault with the “CS” – the show processor for the console which is what the motherboard connects to via ISA.

The desk itself with the face off, and the CF adapter being mounted

Alas being almost 30 years old (this unit was made in 1996) there aren’t a great number of them around to test with, so I could try swapping out the “CS” board. What I did find was another complete console on eBay for $50 but it was in the UK, and they weigh a lot.
Shipping this thing was not an option.

But for a bit of of extra cash the seller was willing to crack the case open and strip out the two main boards and post me just those. This had added bonus that the motherboard and CPU of the board sent from the UK was a 520i meaning it has the Pentium II processor – This Strand 520 was now going to be a Strand 520i.

A month later a box appeared at my door containing the boards, but the battery on the CS board from the UK had well and truly spilled its guts, leaving some toxic sludge around all the components nearby. A can of PCB cleaner and a toothbrush (which I will not be using to brush teeth with anymore) and I’d cleaned it up as best I could, but the fan output from the board was well and truly dead, with some of the SMD components just eaten by the acid.

So I put everything back into the case and wired it up. The mounts for the Motherboard were slightly different, and the software that is used for the 520i is different from the 520 (without the i).

The HDD from the UK was unable to boot, but I was able to get it to spin up enough to copy off the ~5Mb of files I needed, then I did a fresh install of MS DOS and copied the installer for the StrandOS.

Finally I had a stable working console. Not just that but the Strand Networker application was now available to me. So I plugged into the 10Mbps connection and set the console to output to Network as well as DMX.

Enabling the “networker” for network DMX transmission

I cranked open Wireshark and there was a mystery signal sent to the broadcast address on UDP…

I patched a single DMX channel and changed the value and when I viewed the data in Wireshark I could see a hex representation of the DMX 0-255 value.

Easy I thought to myself, it’ll just be a grid of channels, each with their value as hex. Ha! I was wrong.

Turns out Strand Shownet used a conditional form of “Run Length Encoding” compression, where if you’ve got channels 1 through 5 at 50% rather than encoding this is 5 bytes each showing 0x80, it uses 2 bytes, to indicate 5 sequential channels (run length) and then the value (0x80). Then there’s another bit to denote how many forward places to move and if the next channel is using RLE or not.

The code got messy; it’s not the best thing I’ve ever written but it works for 2 full universes of DMX (I need to spend more time to understand where the channel encoding overflow happens as I end up a few channels ahead of where I should be on universe 3 and above).

The code is available on Github and I’d love to know if anyone’s using it with these old dinosaurs!

https://github.com/nickvsnetworking/PyShowNet

Presenting the caller Name in IMS

06/06/2025IMS / VoLTE, Mobile Networks, RFCs & Standards, VoIPCaller ID, IMS, VoIP, VoLTENick

SIP has got a multitude of ways of showing Caller ID, PAI, R-PAI, From, even Contact, but the other day I got a tip (Thanks John!) that you can set a name as the Caller ID in the “Username field “display name” part of the P-Asserted-Identity for the leg from the TAS to the UE, and it’ll show up on the phone, and they’re right.

For example I put:

P-Asserted-Identity: "Nick Jones" <sip:[email protected]>

And lo and behold when I called a test phone on my desk (A Samsung IMS debug phone) here’s what I saw:

There are no contacts defined in this phone, that name is just coming from the SIP INVITE that goes to the phone.

Support for this feature is hit-and-miss on different IMS stacks on different phones, and of course is Carrier Bundle dependent, but it does work.

One thing that it doesn’t do is show the name in the call history, and if you go to “Add as Contact” it still makes you enter the name, clearly that’s not linked in, but it’s a kinda neat feature.

Dell Server I/O Latency

30/05/2025NotesDell, Proxmox, RAID, ServerNick

For the past few years I’ve run a Dell R630 as one of our labs / testing, it’s hosted down the road from me, and with 32 cores and 256 GB of RAM, it’s got enough grunt to run what we need for testing stuff on the East coast and messing around. We’ve got a proper DC with compute in Sydney and Perth, but for breaking stuff, I wanted my own lab.

This box started on VMware but after I’d see really odd disk IO behavior over a long period of time I couldn’t get to the bottom of.

Things would hang, for example you’d go to edit a file on a VM in vi and have to wait 20 seconds for the file to open, I could cat the same file instantly, and other files I could vi instantly.

I initially thought it was that dreaded issue with Ubuntu boxes being unable to revolve their own hostname and waiting for DNS to time out every. single. time. it. did. anything, but I ruled that out when I got the same behavior with live CDs and non Linux OSes.

In the end I narrowed it down to being related to Disk IO.
I read Matt Liebowitz book on VMware VSphere Performance, assuming there was a setting somewhere inside VMware I had wrong.

Around the same time all the unpleasantness was going down with VMware and licencing changes, and so I moved to Proxmox (while keeping a virtualized copy of VMware running inside Proxmox).

But switching hypervisors didn’t fix the issue, so I could rule that out.
So I splashed out and swapped the 16k magnetic SAS drives in the RAID with new SSDs, but still the problem persisted – It wasn’t the drives and I wasn’t seeing a marked increase in performance.

I did a bunch of turning on the PERC card with disk caching, write ahead, etc, but still the problem persisted.

At this stage I was looking at the PERC card or (less likely) the CPU/motherboard/RAM combo.

So over a quiet period, I moved some workloads back onto one of the old 16k magnetic SAS drives that I had pulled out to replace with the SSDs, and benchmarked the disk performance on the standalone SAS drive to compare against the RAID SSD performance.

I used iozone3 to benchmark the performance with:

iozone -t1 -i0 -i2 -r1k -s1g /tmp

Here’s how the SSDs in the RAID two compare to a standalone SAS drive (not in RAID):

Metric	LXC on SSD RAID	Standalone SAS Drive	Difference (Standalone vs. RAID)
Initial Write (Child)	314,682.91 kB/sec	382,771.88 kB/sec	+21.6%
Initial Write (Parent)	177,522.16 kB/sec	119,112.43 kB/sec	-32.9%
Rewrite (Child)	428,456.94 kB/sec	470,486.44 kB/sec	+9.8%
Rewrite (Parent)	180,007.46 kB/sec	73,721.11 kB/sec	-59.0%
Random Read (Child)	404,707.62 kB/sec	406,057.00 kB/sec	+0.3%
Random Read (Parent)	404,410.90 kB/sec	397,718.31 kB/sec	-1.7%
Random Write (Child)	126,042.59 kB/sec	355,304.22 kB/sec	+181.9%
Random Write (Parent)	4,497.75 kB/sec	68,971.35 kB/sec	+1,434%

That Random Write (Parent) at the bottom – Yeah that would explain the “weird” behavior I’ve been seeing on guest OSes.
As part of editing a file with vi it creates a lock file, that would be written to a random sector, and thus taking such a long time (while cat wouldn’t do the same).

Okay – So now I know it’s the PERC at fault or the RAID config on it.

Next I put another SSD, the same type as those in the RAID, but as a standalone drive (Not in the RAID) and here’s the results:

Metric	RAID-15 SSD	SSD Standalone	Difference (WD vs. RAID-5)
Sequential Writes (Child)	314,682.91 kB/sec	511,280.50 kB/sec	+62.5%
Sequential Writes (Parent)	177,522.16 kB/sec	128,016.83 kB/sec	-27.9%
Sequential Rewrites (Child)	428,456.94 kB/sec	467,547.38 kB/sec	+9.1%
Sequential Rewrites (Parent)	180,007.46 kB/sec	79,698.26 kB/sec	-55.7%
Random Reads (Child)	404,707.62 kB/sec	439,705.72 kB/sec	+8.6%
Random Reads (Parent)	404,410.90 kB/sec	437,549.83 kB/sec	+8.2%
Random Writes (Child)	126,042.59 kB/sec	319,127.09 kB/sec	+153.2%
Random Writes (Parent)	4,497.75 kB/sec	125,458.00 kB/sec	+2,689.3% (!)

So sequential write and rewrites were slightly down on the standalone disk, but the other figures all look way better on the standalone SSD.

So that’s my problem, I figure it’s something to do with how the RAID is configured but after messing around for a few hours with all the permutations of settings I tried, I couldn’t get these figures to markedly improve.

As this is a lab box I’ll just dismantle the RAID and run each LXC container / VM on a local (non-RAID) SSD, as data loss from a dying disk is not a concern in my use case, but hopefully this might be of use to someone else seeing the same.

MBR & GBR Values in Bearer Level QoS

23/05/20255G SA, EUTRAN, LTE, Mobile Networks, UncategorizedEPC, EUTRAN, GTP, GTP-C, GTPv2, GTPv2C, LTENick

The other day I had a query about a roaming network that was sending Bearer Level QoS parameters in the Create Session Request to 0Kbps, up and down rather than populating the MBR values.

I knew for Guaranteed Bit Rate bearers that this was of course set, but for non GBR bearers (QCI 5 to 9) I figured this would be set the to MBR, but that’s not the case.

So what gives?

Well, according to TS 29.274:

For non-GBR bearers, both the UL/DL MBR and GBR should be set to zero.

So there you have it, if it’s not a QCI 1-4 bearer then these values are always 0.

CGrates – Multiple Rates & Derived Charging with ChargerS

16/05/2025CGrateS, VoIPCGratesNick

I’ve always been kinda intrigued by the idea of parallel universes, the idea that there are infinite copies of the universe, with myself and all the people I care about, but each with slight differences to the universe I inhabit.

The ChargerS module provides the Butterfly Effect needed to create infinite instances of our CGrateS events, each with subtle differences.

Typically if you’re charging subscribers for calls, someone else (or multiple someones) may charge you for those calls, for example you charge your subscribers for an outbound call, but other carriers you interconnect with will charge you for terminating those calls to their subscribers, and for incoming calls you may want to charge the other carriers that terminate calls into your network.

By defining in CGrates what your suppliers charge you, or what you charge suppliers, or reseller rates, or commissions, or any other varied call charge, allows us to:

See profit on each call
Understand supplier costs
Enable reselling at different rates
Opens the door to Least-Cost Routing (Without knowing the cost, we can’t find the cheapest)
Ensure you don’t have calls where you make a loss (Supplier charged you more than you charged the customer)

So how do we do this?

Well, we do this with ChargerS.

When I first looked at CGrateS, the ChargerS module seems like an extra step that did nothing,

In ngrep you’d see the ChargerSv1.ProcessEvent request, and the response, but it doesn’t really do anything, and it’s a PITA when you don’t have a Charger defined and everything stops working.

I’ve spoken a lot about SIP on this blog, and I’m going to assume some level of familiarity with telephony since we’re talking about CGrateS (which is mostly used for telephony), but the best concept I can relate ChargerS to is Serial Forking in SIP, but for the CGrateS event.

A single CGrateS “event” (JSON RPC) comes into CGrateS from wherever, but with ChargerS, we can fork that single event into multiple CGrateS events, which are all treated as unique events.

This is where it starts to get interesting, let’s say we want to calculate a supplier cost and a retail cost, well, with ChargerS we define a rule for supplier and a rule for retail, one single event comes into CGrateS, but with ChargerS setup to create a retail and supplier event, then there are now two events inside CGrateS, one for the supplier and one for the retail.

First we’ll define a default boring charger:

{
    "method": "APIerSv1.SetChargerProfile",
    "params": [
        {
            "ID": "CHARGER_Default",
            'FilterIDs': [],
            'AttributeIDs' : ['*none'],
            'RunID' : 'default',
            'Weight': 0,
        }
    ]   }

Alright, so far so good, but now we’ll define a second charger, and this one will be for calculating the retail rate for a call.

{
    "method": "APIerSv1.SetChargerProfile",
    "params": [
        {
            "ID": "CHARGER_Retail",
            "FilterIDs": [],
            'AttributeIDs' : ['*constant:*req.Category:RetailCharge'],
            'RunID' : 'charger_retail',
            'Weight': 0,
        }
    ]   }

So what did we just do?

Well, now when the ChargerSv1.ProcessEvent request hits chargers two events will come out and get processed by the rest of CGrateS as if they’re unique events / calls to be rated.

We’ve cloned our event, now we’ve got two copies of the same event.

The first copy (the original event), will be treated exactly as it is now, the other will see a new event generated inside CGrateS, it’ll be a copy of the original event, except for a few minor changes.

Let’s take a look at what happens to our event going through ChargerS when we generate a CDRsV2.ProcessExternalCDR API request:

{
    "method": "ChargerSv1.ProcessEvent",
    "params": [
        {
            "Tenant": "cgrates.org",
            "ID": "2645818",
            "Time": null,
            "Event": {
                "Account": "Nick_Test_123",
                "AnswerTime": "2024-12-26T12:34:44+11:00",
                "CGRID": "18d3e23ac3727474539f29cc11694cac11fb5e32",
                "OriginID": "95fff282-c329-11ef-8e4e-98fa9b127b52",
                "RunID": "*default",
                ...
                "Subject": "Nick_Test_123",
                "Tenant": "cgrates.org",
                "ToR": "*voice",
                "Usage": 150000000000
            },}],"id": 20
}

But now let’s look at what comes out of this request to ChargerS:

{
    "id": 20,
    "result": [
        {
            "ChargerSProfile": "DEFAULT",
            "AttributeSProfiles": null,
            "AlteredFields": [
                "*req.RunID"
            ],
            "CGREvent": {
                "ID": "5fd2d6a",
                "Event": {
                    "Account": "Nick_Test_123",
                    "AnswerTime": "2024-12-26T12:44:40+11:00",
                    "CGRID": "3c01050a3f49fb215e318523dcd4255797d50145",
                    "Category": "call",
                    "RunID": "default",
                },            }
        },{
            "ChargerSProfile": "CHARGER_Retail",
            "AttributeSProfiles": [
                "*constant:*req.Category:RetailCharge"
            ],
            "AlteredFields": [
                "*req.RunID",
                "*req.Category"
            ],
            "CGREvent": {
                "ID": "5fd2d6a",
                "Event": {
                    "Account": "Nick_Test_123",
                    "AnswerTime": "2024-12-26T12:44:40+11:00",
                    "CGRID": "3c01050a3f49fb215e318523dcd4255797d50145",
                    "Category": "RetailCharge",
                    "RunID": "charger_retail",
                },
            }       }    ],    "error": null
}

I’ve tried to keep the above example as minimal as possible, but if we have a look we can now see two events, the first is our default charger, where nothing is changed; it’s got the same category as we set on the ProcessExternalCDR request (call) and the RunID is “default” per the default charger.

But look below and we’ve got another copy, this time the RunID is set to charger_retail, because that’s what we’ve set it to inside the RunID parameter for the charger named CHARGER_Retail, this means when filtering CDRs we’ll be able to spot these ones really easily, and know it’s a fork of a different event.

But importantly we’ve changed some of the values in the CGrateS Event, the same way AttributeS changes stuff.

So what have we changed? Well the Category of the new request is now RetailCharge.

Now if we cast our mind back to setting the RatingProfile back in Tutorial 3, you may remember we set the Category on the RatingProfile.

Now is when this matters. By setting different Categories in our Rating Profile, we can create a new RatingProfile, with the category set to RetailCharge, but referencing a whole different RatingPlan, with different destinations and rates, and this second event that was forked by ChargerS, will match that RatingProfile, and the RatingPlans that go with it.

For everything matching we’ll get two CDRs (if we’re calling *cdrs that is) and they’re treated as totally separate records.

Think about it; by defining a new RatingProfile with category Wholesale with your wholesale rate, and then creating a Charger for that category, you’ll have a retail CDR and a wholesale CDR. Same for reseller rates, commissions, anything!

We’re using this in one of our networks to handle rating for all the SMS traffic, we’ve got various suppliers and sources for A2P and P2P traffic, and having additional chargers to calculate different rates in a different currency for billing our suppliers is super useful.

#Second charger used for calculating the A2P charge for SMS in USD
print(CGRateS_Obj_local.SendData({
    "method": "APIerSv1.SetChargerProfile",
    "params": [
        {
            "ID": "CHARGER_SMS_A2P",
            "FilterIDs": ["*string:~*req.Category:sms", "*notstring:~*req.Account:gsm_0340"],
            'AttributeIDs' : ['*constant:*req.RequestType:*rated;*constant:*req.Category:sms_a2p'],
            'RunID' : 'charger_a2p',
            'Weight': 0,
        }
    ]   }   ))

I’ve put the code examples on the Github repo.

RAN Builds – Can we just get the same connectors thanks?

09/05/2025EUTRAN, LTE, Mobile Networks, RFBuild, RAN, RFNick

Concrete, steel and labor are some of the biggest costs in building a cell site, and yet all the focus on cost savings for cell sites seems to focus on the RAN, but the actual RAN equipment isn’t all that much when you put it into context.

I think this is mostly because there aren’t folks at MWC promoting concrete each year.

But while I can’t provide any fancy tricks to make towers stronger or need less concrete for foundations, there’s some potential low-hanging fruit in terms of installation of sites that could save time (and therefor cost) during network refreshes.

I don’t think many folks managing the RAN roll-outs for MNOs have actually spent a week with a tower crew rolling this stuff out. It’s hard work but a lot of it could be done more efficiently if those writing the MOPs and deciding on the processes had more experience in the field.

Disclaimer: I’m primarily a core networks person, this is the job done from a comfy chair. This is just some observations from the bits of work I’ve done in the field building RAN.

Standardize Power Connectors

Currently radio units from the biggest RAN vendors (Ericsson, Nokia, Huawei, ZTE & Samsung) each use different DC power connectors.

This means if you’re swapping from one of these vendors to another as part of a refresh, you need new power connectors.

If you’re lucky you’re able to reuse the existing DC power cables on the tower, but that means you’re up on a tower trying to re-terminate a cable which is a fiddly job to do on the ground, and far worse in the air. Or if you’re unlucky you don’t have enough spare distance on the DC cables to do the job, then you’re hauling new DC cables up a tower (and using more cables too).

The Nokia and Ericsson connectors are very similar, and with a pair of side cutters you can mangle an Ericsson RRU connector to work on a Nokia RRU and visa-versa.

While Huawei and ZTE have adopted for push connectors with the raw cables behind a little waterproof door.

If we could just settle on one approach (either is fine) this could save hours of install time on each cell site, extrapolate that across thousands of cell sites for each network, and this is a potentially large saving.

Standardize Fiber Cables

The same goes for waterproofing fibre, Ericsson has a boot kit that gets assembled inline over the connectors, Nokia has this too, as well as a rubber slide over cover boot on pre-term cables.

Again, the cost is fairly minimal, but the time to swap is not. If we could standardize a break out box format on the top of the tower and a LC waterproofing standard, we could save significant time during installs, and as long as you over-provision the breakout (The cost difference between a 6 core fiber vs a 48 core fibre is a few dollars), you can save significant time having to rerun cables.

Yes, we’ve all got horror stories about someone over-bending fiber, and if you reused fibre between hardware refresh cycles, but modern fiber is crazy tough so the chances of damaging the reused fiber is pretty slim, and spare pairs are always a good thing.

Preterm DC Cables

Every cell site install features some poor person squatting on the floor (if they’re savvy they’ve got a camping stool or gardening kneeling mat) with a “gut buster” crimping tool swaging on connectors for the DC lugs.

If we just used the same lugs / connectors for all the DC kit inside the cell sites, we could have premade DC cables in various lengths (like everyone does with Ethernet cables now), rather than making each and every cable off a spool (even if it is a good ab workout).

I dunno, I’m just some Core network person who looks at how long all this takes and wonders if there’s a way it could be done better, am I crazy?

What’s the point of Subscribe in IMS – Does it do anything useful?

02/05/20255G SA, IMS / VoLTE, Kamailio, LTE, Mobile Networks, VoIPIMS, LTE, SIP, Subscribe, VoLTENick

Nope – it doesn’t do anything useful. So why is it there?

The SUBSCRIBE method in SIP allows a SIP UAC to subscribe to events, and then get NOTIFY messages when that event happens.

In a plain SIP scenario (RFC 3261), we can imagine an IP Phone and a PBX scenario. I might have “Busy Lamp Field” aka BLF buttons on the screen of my phone, that change colour when the people I call often are themselves on calls or on DND, so I know not to transfer calls to them – This is often called the “presence” scenario as it allows us to monitor the presence of another user.

At a SIP level, this is done by sending a SUBSCRIBE to the PBX with the information about what I’m interested in being told about (State changes for specific users) and then the PBX will send NOTIFY messages when the state changes.

But in IMS you’ll see SUBSCRIBE messages every time the subscriber registers, so what are they subscribing for?

Well, you’re just subscribing to your own registration status, but your phone knows your own registration status, because it’s, well, the registration status of the phone.

So what does it achieve? Nothing.

The idea was in a fixed-mobile-convergence scenario (keeping in mind that’s one of the key goals from the 2008 IMS spec) you could have the BLF / presence functionality for fixed subscribers, but this rareley happens.

For the past few years we’ve just been sending a 200 OK to SUBSCRIBE messages to the IMS, with a super long expiry, just to avoid wasting clock cycles.

Before there was Grafana – How Telcos did metrics & observability at scale before computers

25/04/2025Australian Telco, History, SoftwareAustralian Telecommunications History, CGrates, History, Metering, TelecomNick

I love Grafana. I love metrics and observability. Nothing is more powerful than being able to see what’s going on inside your network/application/solar setup/weather station – you name it.

It’s never been easier to see what’s going on.

If I wanted to monitor my web app as I onboard more customers, Grafana is the go-to tool, but how was it done before the computer age? Let’s go back to the 1940s and look at how the telephone network handled observability and metrics…

This starts with introducing the “Call Meter”, “Subscriber Meter” or “Subs Meter” for short.

Detail of mechanical call meter Strathfield South Exchange
Source – field, field, field and chang’s brilliantly beautiful “That Exchange Project“

The concept is pretty simple. Each telephone service (“subscriber” in telecom parlance) provided by the local telephone exchange gets a subscriber meter or “subs meter”.

When the subscriber (customer) makes a call, and the call is answered, a reverse of polarity on the line ticks the subscriber meter over by one digit.

Each of the meters on the left is a single telephone subscriber, each time they make a call, the meter ticks up by one position. Source – field, field, field and chang’s brilliantly beautiful “That Exchange Project“

As you can imagine if you’ve got a telephone exchange that serves 10,000 customers, well you need 10,000 subscriber meters…

You need a lot of meters… Source – field, field, field and chang’s brilliantly beautiful “That Exchange Project“

At the end of the month, someone takes a photo of all the meters on a film camera, sends it off to a billing center where they develop the photo, then calculate the difference in values from last month’s meter reading photo and this month’s meter reading photo, and bingo – there’s the number of calls the person made. You tabulate the cost on an adding machine and send off the invoice.

Each of the little blocks is a single subscriber to meter and the weird cone thing held is a hood for the camera to photograph the values – Source The Communications Museum Trust

Today we’d just use PromQL:

delta(subscriber_meter{phone_number="123456"}[30d])

Optional Sidebar for those asking “but what about Long Distance calls where you pay per minute?” – In a world where you pay per local call, regardless of length, this works just fine, but as more complicated scenarios like long distance calling were introduced, this presented a challenge, but this could be solved by reversing the line polarity at predefined intervals, to keep ticking up the subscriber meters during the call. Exchange Clocks provided a number of pulse outputs, like 1 pulse per second, 1 pulse per minute, etc, this 1 pulse per minute signal could be hooked up to the line reversal circuit for long distance calls, to trigger the line reversal every minute. This means if a local call was $0.40 untimed, if you made long distance calls at $0.40 per minute, then you just needed the exchange to reverse the line every minute to pulse the meter. 10 increments on the meter could mean 10 x $0.40 local calls or 10 minutes of $0.40 per minute long distance.

These meters were originally just for metering traffic, but engineers in the telephone network realised they could be used as generic “counters” for just about anything in the telephone network.

Let’s imagine you want to know how often a trunk line to another exchange runs out of capacity, well, you simply wire a meter to get triggered each time that condition happens, now you’ve got a counter for each time that event occurs.

Now let’s say you want to know how often you run out of final selectors, well, through another counter on it.

These same meters, can be wired to count fault conditions.

Mechanical fault meters on old step-by-step test desk, Queanbeyan Exchange
Source – field, field, field and chang’s brilliantly beautiful “That Exchange Project“

A pencil and a logbook is how you keep track of frequency of the event being triggered, and if you want to graph it out, graph paper, not Grafana.

As telephone systems increased in complexity more and more meters were used to track what’s going on, up until the time that computers could start to handle that process, when “Electronic Customer Metering” came into play with the early Stored Program Control exchanges.

Metering and charging equipment in Blakehurst Exchange
Source – field, field, field and chang’s brilliantly beautiful “That Exchange Project“

Observability and Metrics are so important for making software, but every time I define a “counter” in software for an event, I’m always reminded of clicking meters in an telephone exchange, knowing this is how it used to be done.

GTPv2 Instance IDs

18/04/20255G SA, EPC, LTE, Mobile Networks, Notes, RFCs & StandardsEPC, GTP, GTP-C, GTPv2, GTPv2C, LTENick

I was diffing two PCAPs the other day trying to work out what’s up, and noticed the Instance ID on a GTPv2 IE was different between the working and failing examples.

So what does it denote, well from TS 129.274:

If more than one grouped information elements of the same type, but for a different purpose are sent with a message,
these IEs shall have different Instance values.

So if we’ve got two IEs of the same IE type (As we often do; F-TEIDs with IE Type 87 may have multiple instances in the same message each with different F-TEID interface types), then we differentiate between them by Instance ID.

The only exception to this rule is where we’ve got the same data, so if you’ve got one IE with the exact same values and purpose that exists twice inside the message.

It’s not Rocket Science – Tracking performance of OneWeb terminals

11/04/2025Mobile Networks, Notes, Python, RF, SoftwareEutelsat, HL1120W, Hughes, OneWeb, satelliteNick

Last year we deployed some Hughes HL1120W OneWeb terminals in one of the remote cellular networks we support.

Unfortunately it was failing to meet our expectations in terms of performance and reliability – We were seeing multiple dropouts every few hours, for between 30 seconds and ~3 minutes at a time, and while our reseller was great, we weren’t really getting anywhere with Eutelsat in terms of understanding why it wasn’t working.

Luckily for us, Hughes (who manufacture the OneWeb terminals) have an unprotected API (*facepalm*) from which we can scrape all the information about what the terminal sees.

As that data is in an API we have to query, I knocked up a quick Python script to poll the API and convert the data from the API into Prometheus data so we could put it into Grafana and visualise what’s going on with the terminals and the constellation.

After getting all this into Grafana and combining it with the ICMP Blackbox exporter (we configured Blackbox to send HTTP requests and ICMP pings out of each of the different satellite terminals we had (a mix of OneWeb and others)) we could see a pattern emerging where certain “birds” (satellites) that passed overhead would come with packet loss and dropouts.

It was the same satellites each time that led to the drops, which allowed us to pinpoint to say when we see this satellite coming over the horizon, we know there’s going to be some packet loss.

In the end Eutelsat acknowledged they had two faulty satellites in the orbit we are using, hence seeing the dropouts, and they are currently working on resolving this (but that actually does require rockets, so we’re left without a usable service for the time being) but it was a fun problem to diagnose and a good chance to learn more about space.

Packet loss on the two OneWeb terminals (Not seen on other constellation) correlated with a given satellite pass

I’ve put the source code for the Hughes terminal Prometheus Exporter onto Github for anyone to use.

The repo has instructions for use and the Grafana templates we used.

At one point I started playing with the OneWeb Ephemeris data so I could calculate the azimuth and elevation of each of the birds from our relative position, and work out distances and angles from the terminal. The maths was kinda fun, but oddly the datetimes in the OneWeb ephemeris data set seems to be about 10 years and 10 days behind the current datetime – Possibly this gives an insight into OneWeb’s two day outage at the start of the year due to their software not handling leap years.

Despite all these teething issues I’m still optimistic about OneWeb, Kupler and Qianfan (Thousand Sails) opening up the LEO market and covering more people in more places.

Update: Thanks to Scott via email who sent this:
One note, there’s a difference between GPS time and Unix time of about 10 years 5 days. This is due to a) the Unix epoch starting 1970-01-01 and the gps epoch starting 1980-01-05 and b) gps time is not adjusted for leap seconds, and ends up being offset by an integer number of seconds.

Update: clarkzjw has published an open source tool for visualizing the pass data https://github.com/clarkzjw/LEOViz

Demystifying SS7 & Sigtran – Part 8 – M3UA

04/04/2025Mobile Networks, RFCs & StandardsM3UA, SIGTRAN, SS7Nick

This is part of a series of posts looking into SS7 and Sigtran networks. We cover some basic theory and then get into the weeds with GNS3 based labs where we will build real SS7/Sigtran based networks and use them to carry traffic.

In our last post we talked about moving MTP2 onto IP and the options available.

When we split the SS7 stack onto IP we don’t need to do this at the Data Link Layer, we can instead do it higher up the stack. This is where we introduce M3UA.

MTP Level 3 User Adaptation Layer – M3UA replaces MTP3 with an IP based equivilent.

This is different to how we’d handle it with M2UA or M2PA where MTP3 remained unchanged, when you deploy M3UA links, there is no MTP3 anymore – it’s replaced with an IP based protocol transported via SCTP designed to do the same role as MTP3 but over IP – That protocol is M3UA.

This means the roles handled in MTP3 such as managing which available point codes are reachable over which linksets, failover, load sharing and reporting are all now handled by the M3UA protocol, because we loose the ability to just rely on MTP3 to do those things like we did when using lower layer protocols like M2PA or MTP2.

So what do you need to know to use M3UA?

Well, the first concept we need to wrap our head around is that we no longer have linksets or pointcode routes (We do, but they’re different) but instead have Application Servers, Application Server Processes and Routing Contexts.

If you’re following along at home and you want to hook your M3UA compatible AS into the Cisco ITP STP, I’ll be including the commands as we go along. The first step on the Cisco (assuming you’ve already defined the basic SS7 config) is to create a local M3UA instance:

cs7 m3ua 2905
 local-ip 10.179.2.154

With that out of the way, let’s cover ASPs & ASs (hehe – Ass).

You can think of the Application Server Process (ASP) as the client end of the “link set” of our virtual SS7 stack, it handles getting the SCTP association up, what IPs, ports and SCTP parameters are needed, and listens and communicates based on that, here’s an example on the Cisco ITP:

cs7 asp NickLab_ASP 2905 2905 m3ua
 remote-ip 10.0.1.252
 remote-ip 172.30.1.12

The ASP connects to a Signaling Gateway (In practical terms this is an STP).

That’s simple enough and now we can do our SCTP handshake, but nothing is going to get routed without introducing the Application Server (AS) itself, which is where we configure the routing and link to 1 or more ASPs and how we want to share traffic among them.

Point codes are still used in M3UA for sending traffic from an M3UA AS but it’s not what controls the routing to an AS.

That probably sounds confusing, I send traffic based on point code, but the traffic does’t get to the M3UA AS via point code? What gives?

Well, first we’ve got to introduce the Routing Context in M3UA.

Routing Contexts define what destinations are served by this AS.
As an example, on our STP we’ll define a Routing Context inside the ITP inside the AS section, in this example we’re creating Routing Key 1 which will handle traffic to the point code 5.123.2, but we could equally define a routing-key for a given Global Title address too.

cs7 instance 0 as NickPC m3ua
 routing-key 1 5.123.2 
 asp NickLab_ASP
 traffic-mode broadcast

Notice we didn’t define Routing Key X -> Point Code Y -> ASP Z ? That’s because we may have one or more ASPs associated with this (remember ASPs are kinda like Linksets).

For example the Point Code for an HLR might have multiple ASPs behind it, with traffic-mode loadshare to load balance the requests among all the HLRs.

So what does it look like to bring this up? Let’s take a look at a link coming up.

Under the hood we’ve got the SCTP connection / handshake like normal, then our ASP sends an ASPUP (ASP is in state “up”) message to the Signaling Gateway (STP).

Now our ASP has told the Signaling Gateway it’s there, so our Signaling Gateway returns an ASPUP_ACK to confirm it’s got the message and the current AS state is inactive.

And with that our ASP is in “an up state, “inactive” state; it’s connected to the STP, but without any ASes associated with our ASP, it’s akin to having link layer but nothing else.

State in the STP showing an ASP without an active AS

So next our ASP will send an ASPAC (ASP Active) message for the given routing contexts the AS serves, in this case, Routing Context 1.

And with that, the Signaling Gateway (STP) send back an an ASPAC_ACK (ASP Active Ack) to confirm it’s got it, and the state changes.

ASP Active Ack Message from SG (STP) to ASP

Because of how MTP3 worked advertising available point codes, the SG (STP) needs to tell the AS/ASP how it sees the world and the state of the connection.

This is done with a NTFY (Notify) message from the STP/SG to indicate the state has changed to active, and what destinations are reachable, and at this point, we’re good to start handling traffic for that Routing Context.

And with that, we can start handling M3UA traffic.

There’s only one more key dialog to wrap your heads around that’s the DAVA and DUNA messages.

DAVA is Destination Available, and DUNA is Destination Unavailable. The SG (STP) will send these messages to ASP/AS every time the reachability of a neighboring point code changes.

That’s the basics covered, I’m in the process of developing an HLR (Running with MAP/TCAP/SCCP/M3UA) extension for PyHSS, which in the future will allow us to experiment with more M3UA endpoints.

Nick vs Networking

Telco Network Engineering