One of the new features of 5GC is the introduction of Service Based Interfaces (SBI) which is part of 5GC’s Service Based Architecture (SBA).
Let’s start with the description from the specs:
3GPP TS 23.501 [3] defines the 5G System Architecture as a Service Based Architecture, i.e. a system architecture in which the system functionality is achieved by a set of NFs providing services to other authorized NFs to access their services.
3GPP TS 29.500 – 4.1 NF Services
For that we have two key concepts, service discovery, and service consumption
Services Consumer / Producers
That’s some nice words, but let’s break down what this actually means, for starters, let’s talk about services.
In previous generations of core network we had interfaces instead of services. Interfaces were the reference point between two network elements, describing how the two would talk. The interfaces were the protocols the two interfaces used to communicate.
For example, in EPC / LTE S6a is the interface between the MME and the HSS, S5 is the interface between the S-GW and P-GW. You could lookup the 3GPP spec for each interface to understand exactly how it works, or decode it in Wireshark to see it in action.
5GC moves from interfaces to services. Interfaces are strictly between two network elements, the S6a interface is only used between the MME and the HSS, while a service is designed to be reusable.
This means the Service Based Interface N5g-eir can be used by the AMF, but it could equally be used by anyone else who wants access to that information.
3GPP defines the service in the form of service producer (The EIR produces the N5g-eir service) and the service consumer (The client connecting to the N5g-eir service), but doens’t restrict which network elements can
This gets away from the soup of interfaces available, and instead just defines the services being offered, rather than locking the
“service consumers” (which can be thought of clients in a client/server model) can discover “service producers” (like servers in a client/server model).
Our AMF, which acts as a “service consumer” consuming services from the UDM/UDR and SMF.
Service Discovery – Automated Discovery of NF Services
Service-Based Architecture enables 5G Core Network Function service discovery.
In simple terms, this means rather that your MME being told about your SGW, the nodes all talk to a “Network Repository Function” that returns a list of available nodes.
The mobility management and connection management process in 5GC focuses on Connection Management (CM) and Registration Management (RM).
Registration Management (RM)
The Registration Management state (RM) of a UE can either be RM-Registered or RM-Deregistered. This is akin to the EMM state used in LTE.
RM-Deregistered Mode
From the Core Network’s perspective (Our AMF) a UE that is in RM-Deregistered state has no valid location information in the AMF for that UE. The AMF can’t page it, it doesn’t know where the UE is or if it’s even turned on.
From the UE’s perspective, being in RM-Deregistered state could mean one of a few things:
UE is in an area without coverage
UE is turned off
SIM Card in the UE is not permitted to access the network
In short, RM-Deregistered means the UE cannot be reached, and cannot get any services.
RM-Registered Mode
From the Core Network’s perspective (the AMF) a UE in RM-Registered state has sucesfully registered onto the network.
The UE can perform tracking area updates, period registration updates and registration updates.
There is a location stored in the AMF for the UE (The AMF knows at least down to a Tracking Area Code/List level where the UE is).
The UE can request services.
Connection Management (CM)
Connection Management (CM) focuses on the NAS signaling connection between the UE and the AMF.
To have a Connection Management state, the Registration Management procedure must have successfully completed (the UE being in RM-Registered) state.
A UE in CM-Connected state has an active signaling connection on the N1 interface between the UE and the AMF.
CM-Idle Mode
In CM-Idle mode the UE has no active NAS connection to the AMF.
UEs typically enter this state when they have no data to send / recieve for a period of time, this conserves battery on the UE and saves network resources.
If the UE wants to send some data, it performs a Service Request procedure to bring itself back into CM-Connected mode.
If the Network wants to send some data to the UE, the AMF sends a paging request for the UE, and upon hearing it’s identifier (5G-S-TMSI) on the paging channel, the UE performs the Service Request procedure to bring itself back into CM-Connected mode.
CM-Connected Mode
In CM-Connected mode the UE has an active NAS connection with the AMF over the N1 interface from the UE to the AMF.
When the access network (The gNodeB) determines this state should change (typically based on the UE being idle for longer than a set period of time) the gNodeB releases the connection and the UE transitions to CM-Idle Mode.
The other side of the SCTP connection didn’t like my SCTP parameters.
My SCTP INIT looked like this:
By default, Linux includes support for the ECN and SupportedAddressTypes parameters in the SCTP INIT.
But the other side did not like this, it sent back an ERROR stating that:
Okay, apparently it doesn’t like the fact that we support Forward Transmission Sequence Numbers – So how to turn it off?
I’m using P1Sec’s PySCTP library to interact with the SCTP stack, and I couldn’t find any referneces to this in the code, but then I rememberd that PySCTP is just a wrapper for libsctp so I should be able to control it from there.
Inside `/proc/sys/net/sctp` we can see all the parameters we can control.
To disable the Forward TSN I need to disable the feature that controls it – Forward Transmission Sequence numbers are introduced in the Partial Reliability Extension (RFC 3758). So it was just a matter of disabling that with:
So let’s roll up our sleeves and get a Lab scenario happening,
To keep things (relatively) simple, I’ve put the eNodeB on the same subnet as the MME and Serving/Packet-Gateway.
So the traffic will flow from the eNodeB to the S/P-GW, via a simple Network Switch (I’m using a Mikrotik).
While life is complicated, I’ll try and keep this lab easy.
Experiment 1: MTU of 1500 everywhere
Network Element
MTU
Advertised MTU in PCO
1500
eNodeB
1500
Switch
1500
Core Network (S/P-GW)
1500
So everything attaches and traffic flows fine. There is no problem right?
Well, not a problem that is immediately visible.
While the PCO advertises the MTU value at 1500 if we look at the maximum payload we can actually get through the network, we find that’s not the case.
This means if our end user on a mobile device tried to send a 1500 byte payload, it’d never get through.
While DNS would work, most TCP traffic would flow fine, certain UDP applications would start to fail if they were sending payloads nearing 1500 bytes.
So why is this?
Well GTP adds overhead.
8 bytes for the GTP header
8 bytes for the transport UDP header
20 bytes for the transport IPv4 header
14 bytes if our transport is using Ethernet
For a total of 50 bytes of overhead, assuming we’re not using MPLS, QinQ or anything else funky on our transport network and just using Ethernet.
So we have two options here – We can either lower the MTU advertised in our Protocol Configuration Options, or we can increase the MTU across our transport network. Let’s look at each.
Experiment 2: Lower Advertised MTU in PCO to 1300
Well this works, and looks the same as our previous example, except now we know we can’t handle payloads larger than 1300 without fragmentation.
Experiment 3: Increase MTU across transmission Network
While we need to account for the 50 bytes of overhead added by GTP, I’ve gone the safer option and upped the MTU across the transport to 1600 bytes.
With this, we can transport a full 1500 byte MTU on the UE layer, and we’ve got the extra space by enabling jumbo frames.
Obviously this requires a change on all of the transmission layer – And if you have any hops without support for this, you’ll loose packets.
Conclusions?
Well, fragmentation is bad, and we want to avoid it.
For this we up the MTU across the transmission network to support jumbo frames (greater than 1500 bytes) so we can handle the 1500 byte payloads that users want.
I generally do this with Python or via the Swagger UI for the Web UI, but here’s how we can create a fixed-line IMS subscriber in PyHSS, so we can register it with a softphone, without using EAP-AKA.
Firstly we create the AuC object for this password combo.
If you’re working with the larger SIM vendors, there’s a good chance they key material they send you won’t actually contain the raw Ki values for each card – If it fell into the wrong hands you’d be in big trouble.
Instead, what is more likely is that the SIM vendor shares the Ki generated when mixed with a transport key – So what you receive is not the plaintext version of the Ki data, but rather a ciphered version of it.
But as long as you and the SIM vendor have agreed on the ciphering to use, an the secret to protect it with beforehand, you can read the data as needed.
This is a tricky topic to broach, as transport key implementation, is not covered by the 3GPP, instead it’s a quasi-standard, that is commonly used by SIM vendors and HSS vendors alike – the A4 / K4 Transport Encryption Algorithm.
It’s made up of a few components:
K2 is our plaintext key data (Ki or OP)
K4 is the secret key used to cipher the Ki value.
K7 is the algorithm used (Usually AES128 or AES256).
It’s important when defining your electrical profile and the reuqired parameters, to make sure the operator, HSS vendor and SIM vendor are all on the same page regarding if transport keys will be used, what the cipher used will be, and the keys for each batch of SIMs.
Here’s an example from a Huawei HSS with SIMs from G&D:
We’re using AES128, and any SIMs produced by G&D for this batch will use that transport key (transport key ID 1 in the HSS), so when adding new SIMs we’ll need to specify what transport key to use.
In our last post we covered the basics of NB-IoT Non-IP Data Deliver (NIDD), and if that acronym soup wasn’t enough for you, we’re going to take a deep dive into the flows for attaching, sending, receiving and closing a NIDD session.
The attach for NIDD is very similar to the standard attach for wideband LTE, except the MME establishes a connection on the T6a Diameter interface toward the SCEF, to indicate the sub is online and available.
The NIDD Attach
The SCEF is now able to send/receive NIDD traffic from the subscriber on the T6a interface, but in reality developers don’t / won’t interact with Diameter, so the SCEF exposes the T8 API that developers can interact with to access an abstraction layer to interact with the SCEF, and then through onto the UE.
If you’re wondering what the status of Open Source SCEF implementations are, then you may have already guessed we’re working on one! PyHSS should have support for NB-IoT SCEF features in the future.
NB-IoT provides support for Non-IP Data Delivery (NIDD) over 3GPP Networks, but to handle this, some new network elements are introduced, in a home network scenario that’s the SCEF and the SCF/AS.
On the 3GPP side the SCEF it communicates to the MME via the T6a Interface, which is based upon Diameter.
On the side towards our IoT Service Consumers (in the standards referred to as “SCS/AS” or “Service Capabilities Server Application Servers” (catchy names as always), via the RESTful HTTP based T8 interface.
The start of the S1 Attach procedure is very similar to a regular S1 attach.
The initial S1 PDU Connectivity Request indicates in the ESM Message Container that the PDN Type is Non IP.
Other than that, the initial attach procedure looks very similar to the regular S1 attach procedure.
On the S6a interface the Update Location Request from the MME to the HSS indicates that this is an EUTRAN-NB-IoT Radio Access Type.
And the Update Location Answer APN Configuration contains some additional AVPs on the APN to indicate that the APN supports Non-IP-PDN-Type and that the SCEF is used for Data Delivery.
The SCEF-ID (Diameter Host) and SCEF-Realm (Diameter realm) to serve this user is also specified in the APN Configuration in the Update Location Answer.
This is how our MME determines where to send the T6a traffic.
With this, the MME sends a Connection Management Request (CMR) towards the SCEF specified in the SCEF-ID returned by the HSS.
The Connection Management Request / Response
The MME now sends a Diameter T6a Connection Management Request to the SCEF in the Update Location Answer,
In it we have a Session-Id, which continues for the life of our NIDD session, the service-selection which contains our APN (In our case “non-ip”) and the User-Identifier AVP which contains the MSISDN and/or IMSI of the subscriber.
To accept this, the SCEF sends back a Connection-Management-Answer to confirm we’re all good to go:
At this point our SCEF now knows about the subscriber who’s just attached to our network, and correlates it with the APN and the session-ID.
On the S1 side the connection is confirmed and we’re ready to roll.
Mobile Originated Data Request / Response
When the UE wants to send NIDD it’s carried in NAS messaging, so we see an Uplink NAS transport from the UE and inside the NAS payload itself is our HEX data.
Our MME grabs this out and sends it in the form of of a Mobile-Originated-Data-Request (MODR) to the SCEF, along with the same Session-ID that was setup earlier:
At this stage our Non-IP Data is exposed over the T8 RESTful API, which we won’t cover in this post.
Note: I’m lazily posting this as its been in my drafts folder for an exceedingly long time – Before going too much further, it’s worth pointing out that eMBMS never really made it anywhere – no production networks of note use eMBMS. I started researching it and my interest petered out once I discovered I couldn’t get any UEs or hardware that supported eMBMS.
Mobile networks are designed as point to point, all traffic is unicast.
But multicast and broadcast traffic is real, and becoming more common in some applications.
In areas where users stream the same radio program, or TV show, live, each of them is consuming the same data stream, but each one gets sent a unique copy of the data, on a resource block allocated to them for reception of the data.
For Mission Critical Push to Talk applications, the lack of broadcast/multicast support was highlighted again. For a PTT app with 10 users in a talk group, you’d need to schedule resource blocks for 10 users, and allocate 10 radio resources 10 times, send GTP packets 10 times, all to send the same data to 10 people.
So enter eMBMS – The Evolved Multimedia Broadcast and Multicast Service, providing multicast service for LTE.
Overall Architecture
eMBMS introduces a few changes to the RAN side to handle support for a shared data channel, which is sent by the eNodeB and that UEs can listen on to get data. (More on admission control later)
From a core perspective two new network elements are introduced, the Broadcast/Multicast Service Center (BM-SC) and Multimedia Broadcast Multicast Services Gateway (MBMS GW), these elements function in much the same was the P-GW and S-GW retrospectively, but in regards to Multicast services.
Like so many 3GPP specs before it, MBMS relies on GTP for transporting the data to be distributed, and relies on GTPv2-C for control plane data.
BM-SC – Broadcast Media Service Centre
The Broadcast Multicast Service Centre acts as the gateway between content providers (providing streams of data to be distributed) and the EPC.
The BM-SC sets up eMBMS sessions and pulls broadcast data from the content providers and collects receipts from subscribers of some streams to charge / track consumption of the services.
In this regard the BM-SC is akin to the P-GW, which as the border for the EPC and external networks, except it’s largely unidirectional.
MBMS Gateway
The MBMS Gateway (MBMS-GW) encapsulates the broadcast data stream from the BM-SC and encapsulates it into GTP packets to be distributed to eNBs across the network.
The MBMS-GW allocates a multicast transport address for each broadcast data stream?
MME Interaction
For this a new interface is introduced on the MME – the Sm interface, which interconnects the MME and the MBMS-Gateways assigned to it.
I started seeing this error the other day when running CDRsv1.GetCDRs on the CGrateS API:
SERVER_ERROR: unexpected end of JSON input
It seemed related to certain CDRs in the cdrs table of StoreDB.
After some digging, I found the stupid simple problem:
I’d written too much data to extra_fields, leading MySQL to cut off the data mid way through, meaning it couldn’t be reconstructed as JSON by CGrateS again.
Like the rounding issue I had, this wasn’t an issue with CGrateS but with MySQL.
Quick fix:
sudo mysql cgrates -e "ALTER TABLE cdrs MODIFY extra_fields LONGTEXT;"
And new fields can exceed this length without being cut off.
This post follows on from Part 1 and Part 2 of this 3 part series.
We are forced to move to 5G-SA
Claim: We must use 5G-SA with this spectrum (It’s a condition of the license)
I’ll concede that if it is a requirement for a license or funding, that 5G-SA be used, then that’s a pretty ironclad reason to introduce 5G-SA.
Claim: Users will Leave if you don’t have 5G-SA
We could argue the opposite effect will happen; Shifting to SA will reduce your user base. Here’s why:
Users experiencing 5G-NSA (Non-Standalone) today, are already getting the speed boost from “5G”.
From a user perspective, while 5G-NSA support has been becoming common on mid-to-high priced handsets, handsets supporting 5G-SA are far less common.
Dish’ Project Genesis is one of the only examples of a 5G SA network deployed on a large scale. It launched with only a single supported phone (A Motorola branded handset) and today the supported phone list is very short, limited to expensive flagships. This lack of handset support means users must purchase a handset through Dish rather than being able to bring their own phones, as the only way that compatibility can be guaranteed is by controlling the whole ecosystem.
Unless you are in a highly developed market with 2G and 3G turned off, where the majority of your user base has recent generation flagship phones capable of supporting these features, you’re shrinking your addressable market with 5G-SA, rather than expanding it.
Conclusion – 5G-SA doesn’t stack up, what do I do?
SA doesn’t make sense for a lot of operators and markets – for now. I’m sure this post will look pretty dated in a few years time as many of these factors change and as operators sunset 2G and 3G networks.
I’m not advocating for 5G-SA never, I’m advocating not 5G-SA today.
There are simply better options out there for spending that operations budget to make network improvements.
Off the bat some ideas to expore:
Optimize your existing network.
Roll out NSA to an even larger area.
Shutdown 2G/3G layers.
Simplify your operations.
Cut down the number of vendors and moving parts.
Simplify again.
Automate.
Simplify more.
Doing this will mean you can enjoy cost savings from reduced headcount thanks to a simpler network. Simpler networks have better up-time, thanks to operating a network that’s less frankensteiny – less cobbled together from disparate legacy parts. You’ll also Enjoy reduced opex from all the systems you’ve shut down and cheaper roaming from all the bilaterals you moved to VoLTE.
All of these tasks will keep project teams busy for years and put the MNO in a stronger position moving forward, without getting distracted by slick marketing and shiny brochures.
A totally complete history and not just something I learned from a fellow phone nerd who’s been around a lot longer than me...
In the early days of telephony voice calls were made by signaling to an operator who would connect your call.
Around the turn of the century the first “automatic” exchanges began to open. This meant that a subscriber could complete their own call, by directly dialing the digits of the party the want to speak to, and getting through, without a human operator “plugging up” the call.
The first type of switches used to provide “automatic” exchange capability were Strowger type switches, they translated the pulses from a rotary dial phone into physical movements on a switch to find and select the line you want.
People who were born before touch tone phones can tell you about how you can dial a phone number without using the dial at all. By mashing the hook switch really quickly. If you want to dial a 3 you mash the hook switch 3 times, then wait a second, then to dial a 5 smash the hook switch 5 times, etc, etc.
A quirk of this is that higher numbers, are harder to dial, you just need one pulse of the hook switch in a second to dial a 1, but you need 10 pulses to dial a 0. This means a phone dial that’s running too slow can dial the lower digits, but not the higher digits, as it can’t pulse out the required number of pulses is the time allotted (a smidge over 1 second per digit).
Initially exchanges only connected local calls, but with the introduction of Subscriber Trunk Dialing (STD), subscribers could call from one exchange to another without an operator.
This led to national dialing plans being developed, ensuring uniqueness of numbers across the whole of the phone network, and where possible, lower numbers were used, Australia for example has area codes 02, 03, 07 and 08 (with the majority of the population living in the 02 and 03 area codes).
Now imagine you’re the government owned phone company, tasked with creating a single number for emergency services, 123, 111, etc, etc, are all taken up, as these are the most reliable numbers to dial and were used long ago.
Instead you go to the other end, the UK with 999 and Australia with 000 (911 is a different kettle of fish).
Except in New Zealand.
111 was specifically chosen to be similar to Britain’s 999 service, but NZ has some odd peculiarities.
The NZ dials are identical to the standard dial except for the finger plate label.
With pulse dialing, New Zealand telephones pulse “in reverse” to the rest of the world. Dialing 0 on a phone in the rest of the world, sent ten pulses down the line. But dialing a 0 on a phone in NZ sent one pulse down the line. The same for all the other numbers. The phones weren’t different – Just the labels.
Hence the reason why ‘Emergency’ services were on 111 in NZ (but actually pulsed 999) as the exchanges originated in those days from the UK where 999 was (and still is 999).
In the early years of 111, the telephone equipment was based on British Post Office equipment, except for this unusual orientation. Therefore, dialing 111 on a New Zealand telephone sent three sets of nine pulses to the exchange, exactly the same as the UK’s 999.
PyHSS is our open source Home Subscriber Server, it’s written in Python, has a variety of different backends, and is highly perforate (We benchmark to 10K transactions per second) and infinitely scaleable.
In this post I’ll cover the basics of setting up PyHSS in your enviroment and getting some Diameter peers connected.
For starters, we’ll need a database (We’ll use MySQL for this demo) and an account on that database for a MySQL user.
So let’s get that rolling (I’m using Ubuntu 24.04):
sudo apt update sudo apt install mysql-server
Next we’ll create the MySQL user for PyHSS to use:
CREATE USER 'pyhss_user'@'%' IDENTIFIED BY 'pyhss_password';
GRANT ALL PRIVILEGES ON *.* TO 'pyhss_user'@'%' WITH GRANT OPTION;
FLUSH PRIVILEGES;
We’ll also need Redis as well (PyHSS uses Redis for inter-service communications and for caching), so go ahead an install that for your distro:
sudo apt install redis-server
So that’s our prerequisites sorted, let’s clone the PyHSS repo:
And install the requirements with pip from the PyHSS repo:
pip3 install -r requirements.txt
Next we’ll need to configure PyHSS, for that we update the config file (config.yaml) with the settings we want to use.
We’ll start by setting the bind_ip to a list of IPs you want to listen on, and your transport – We can use either TCP or SCTP.
For Diameter, we will set OriginHost and OriginRealm to match the Diameter hostname you want to use for this peer, and the Realm of your Diameter network.
Lastly we’ll need to set the database parameters, updating the database: section to populate your credentials, setting your username and password and the database to match your SQL installation we setup at the start.
With that done, we can start PyHSS, which we do using systemctl.
Because there’s multiple microservices that make up PyHSS, there’s multiple systemctl files use to run PyHSS as a service, they’re all in the /systemd folder.
Like a lot of companies, we’re moving away from VMware, and in our case, shifting to Proxmox.
But that doesn’t mean we can get entirely away from VMware, but more that it’s not our hypervisor of choice anymore, and this means shifting our dev environments and lab off VMware to Proxmox first.
So today I sat down to try and shift everything to Proxmox, while keeping the VMware based VMs accessible until they can slowly die of bitrot.
A sane person would probably utilize Proxmox’s fancy new tool for migrating VMs from VMware to Proxmox, and it’s great, but in our case at least, it required logging into each VM and remapping NICs, etc, which is tricky on boxes I don’t have access to – Plus we need to keep some VMware capability for testing / labbing stuff up.
So I decided into install Proxmox onto the bare metal servers, and then create a VMware virtual machine inside the Proxmox stack, to host a VMware ESXi instance.
I started off inside VMware (Before installing any Proxmox) by moving all the VMs onto a single physical disk, which I then removed from the server, so as to not accidentally format the one disk I didn’t want to format.
Next I nuked the server and setup the new stack with Proxmox, which is a doddle, and not something I’ll cover.
Then I loaded a VMware ISO into Proxmox and started setting up the VM.
Now, nested virtualization is a real pain in the behind.
VMware doesn’t like not being run on bare metal, and it took me a good long amount of time to find the hardware config that I could setup in Proxmox that VMware would accept.
Create the VM in the Web UI; I found using a SATA drive worked while SCSI failed, so create a SATA based LVM image to use, and mount the datastore ISO.
Then edit /etc/pve/qemu-server/your_id.conf and replace the netX, args, boot and ostype to match the below:
Now you can go and start the VM, but once you’ve got the VMware splash screen, you’ll need to press Shift + O to enter the boot options.
At the runweasle cdromBoot after it add allowLegacyCPU=true– This will allow ESXi to use our (virtual) CPU.
Next up you’ll install VMware ESXi just like you’ve probably done 100 times before (is this the last time?), and once it’s done installing, power off, we’ll have to make few changes to the VM definition file.
Then after install we need to change the boot order, by updating:
boot: order=sata0
And unmount the ISO:
ide2: none,media=cdrom
Now remember how I’d pulled the hard disk containing all the VMware VMs out so I couldn’t break it? Well, don’t drop that, because now we’re going to map that physical drive into the VM for VMware, so I can boot all those VMs.
I plugged in the drive and I used this to find the drive I’d just inserted:
fdisk -l
Which showed the drive I’d just added last, with it’s VMware file system.
So next we need to map this through the VM we just created inside Proxmox, so VMware inside Proxmox can access the VMware file system on the disk filled with all our old VMware VMs.
VM. VM. VM. The word has lost all meaning to me at this stage.
We can see the mount point of our physical disk; in our case is /dev/sdc so that’s what we’ll pass through to the VM.
And now, if everything has gone well, after logging into the Web UI, you’ll see this:
Then the last step is going to be re-registering all the VMs, you can do this by hand, by selecting the .vmx file and adding it.
Alternately, if you’re lazy like me, I wrote a little script to do the same thing:
[root@localhost:~] cat load_vms3.sh
#!/bin/bash
# Datastore name DATASTORE="FatBoi/"
# Log file to store the output LOG_FILE="/var/log/register_vms.log"
# Clear the log file > $LOG_FILE
echo "Starting VM registration process on datastore: $DATASTORE" | tee -a $LOG_FILE
# Check if datastore directory exists if [ ! -d "/vmfs/volumes/$DATASTORE" ]; then echo "Datastore $DATASTORE does not exist!" | tee -a $LOG_FILE exit 1 fi
# Find all .vmx files in the datastore and register them find /vmfs/volumes/$DATASTORE -type f -name "*.vmx" | while read VMX_PATH; do echo "Registering VM: $VMX_PATH" | tee -a $LOG_FILE vim-cmd solo/registervm "$VMX_PATH" | tee -a $LOG_FILE done
echo "VM registration process completed." | tee -a $LOG_FILE
[root@localhost:~] sh load_vms3.sh
Now with all your VMs loaded, you should almost be ready to roll and power them all back on.
But before we reboot the Hypervisor (Proxmox) we’ll have to reboot the VMware hypervisor too, because here’s something else to make you punch the screen:
Luckily we can fix this one globaly.
SSH into the VMware box, edit /etc/vmware/config.xml file and add:
vhv.enable = "FALSE"
Which will disable the performance counters.
Now power off the VMware VM, and reboot the Proxmox hypervisor, when it powers on again, Proxmox will allow nested virtualization, and when you power back on the VMware VM, you’ll have performance counters disabled, and then, you will be done.
Yeah, not a great use of my Saturday, but here we are…
The Equipment Identity Register (EIR) is a pretty handy function in 3GPP networks.
Via the Diameter based S13 interface, the MME, is able to query the EIR to ask if a given IMEI & IMSI combination should be allowed to attach.
This allows stolen / grey market / unauthorized devices (IMEIs) to be rejected from the network, the EIR can have a list of “bad” IMEIs that if seen will reject the request.
It also allows us to lock a SIM (IMSI) to a given device (IMEI) or type of device – We can use this for say a Fixed Wireless service, to lock the SIMs (IMSIs) to a range of modems (IMEI Prefixes).
Lastly it gives us insight and analytics into the devices used on the network, by mapping the IMEI to a device, we can say that IMEI 1234567890 is an Apple iPhone 12 Pro Max, or a Nokia Fastmile 5G-24W-A.
PyHSS supports all these capabilities, so let’s have a look at how we’d manage / access them.
Setting up EIR Rules
These rules are set via the RESTful API in PyHSS.
The Equipment Identity Register built into PyHSS supports matching in one of two modes, set by regex_mode.
In Exact Mode (regex_mode: 0) matches are based on an exact matching IMEI, and matching the IMSI if set (If IMSI is set to nothing (”), then only the IMEI is evaluated).
Exact Mode is suited for IMEI/IMSI locking, to ensure a SIM is locked to a particular device, or to blacklist stolen devices.
Regex Mode (regex_mode: 1) matches based on Regex, this is suited for whitelisting IMEI prefixes for say, specific validated vendors.
The match_response_code maps to the Equipment-Status AVP output, so specified values are:
0 : ‘Whitelist’
1: ‘Blacklist’
2: ‘Greylist’
Some end to end examples of this provisioned into the API:
If the IMEI starts with 777 and the IMSI is 1234123412341234 then return 2 (Greylist).
No Match Behaviour
If there is no match from the backend, then the config parameter no_match_response dictates the response code returned (Blacklist/Whitelist/Greylist).
Mapping Type Allocation Codes (TACs) to IMEIs
There are several data feeds of the Type Allocation Codes (TACs) which map a given IMEI prefix to a model number.
Unfortunately, this data is not freely available, so we can’t bundle it with PyHSS, but if you have the IMEI Database, you can load it into PyHSS using Redis, to allow us to report on this data.
In your config.yaml you’ll just need to set the tac_database parameter, which will read the data on startup.
Triggering on SIM Swap
If we keep track of the current IMSI/IMEI combination used for each SIM/Device, we can get notified every time it changes.
You might want to use this to trigger OTA provisioning or clear old data in your IMS.
For that we can use the sim_swap_notify_webhook in the config to send a HTTP POST to a given endpoint to inform it that a SIM is now in a different device.
We also have to have imsi_imei_logging set to true in the Config in order to log the history.
Reporting on IMEIs
We can also log/capture historical data about IMSI/IMEI combinations.
We use this from a customer support perspective to be able to see if a customer has recently changed phones, so if they call support, our staff can ask the customer about it to help troubleshoot.
“I can see you were connected previously on a Samsung Galaxy S22, but now you’re using a Nokia 3310, did the issues happen before you moved phones?”
This is super handy.
We can get a general log of IMSI vs IMEI like this:
But what’s more useful is searching for a IMSI or an IMEI and then getting back a full list of devices / SIMs that have been used.
Lastly via Grafana we export all this data, which allows us to visualize this data and build dashboards showing the devices on the network.
Visualizing EIR Data in Grafana
PyHSS includes a Promethues exporter, when it comes to prom_eir_devices_total it lists each seen Type Allocation Code / UE in the network, along with the number we’ve seen of each.
Raw it looks like this:
But visualized in Grafana we can get a dashboard to give us a breakdown per vendor:
Hello Nick, thank you for the article. What is the use of the OPc key to be derived from OP key ? Why can’t it just be a random key like Ki ?
It’s a super good question, and something I see a lot of operators get “wrong” from a security best practices perspective.
Refresher on OP vs OPc Keys
The “OP Key” is the “operator” key, and was (historically) common for an operator.
This meant all SIMs in the network had a common OP Key, and each SIM had a unique Ki/K key.
The SIM knew both, and the HSS only needed to know what the Ki was for the SIM, as they shared a common OP Key (Generally you associate an index which translates to the OP Key for that batch of SIMs but you get the idea).
But having common key material is probably not the best idea – I’m sure there was probably some reason why using a common key across all the SIMs seemed like a good option, and the K / Ki key has always been unique, so there was one unique key per SIM, but previously, OP was common.
Over time, the issues with this became clear, so the OPc key was introduced. OPc is derived from mushing the K & OP key together. This means we don’t need to expose / store the original OP key in the SIM or the HSS just the derived OPc key output.
This adds additional security, if the Ki for a SIM were to be exposed along with the OP for that operator, that’s half the entropy lost. Whereas by storing the Ki and OPc you limit the blast radius if say a single SIMs data was exposed, to only the data for that particular SIM.
This is how most operators achieve this today; there is still a common OP Key, locked away in a vault alongside the recipe for Coca-cola and the moon landing set.
But his OP Key is no longer written to the SIMs or stored in the HSS.
Instead, during the personalization process (The bit in manufacturing where SIMs get the unique data written to them (The IMSI & keys)) a derived OPc key is written to the card itself, and to the output files the operator then loads into their HSS/HLR/AuC.
This is not my preferred method for handling key material however, today we get our SIM manufacturers to randomize the OP key for every card and then derive an OPc from that.
This means we have two unique keys for each SIM, and even if the Ki and OP were to become exposed for a SIM, there is nothing common between that SIM, and the other SIMs in the network.
Do we want our Ki to leak? No. Do we want an OP Key to leak? No. But if we’ve got unique keys for everything we minimize the blast radius if something were to happen – Just minimizes the risk.
After we setup CgrateS the next thing we’d generally want to do would be to rate some traffic.
Of course, that could be realtime traffic, from Diameter, Radius, Kamailio, FreeSWITCH, Asterisk or whatever your case may be, but it could just as easily be CSV files, records from a database or a text file.
We’re going to be rating CDRs from simple CSV files with the date of the event, calling party, called party, and talk time, but of course your CDR exports will have a different format, and that’s to be expected – we tailor the Event Reader Service to match the format of the files we need.
The Event Reader Service, like everything inside CgrateS, is modular. ERS is a module we load that parses files using the rules we define, and creates Events that CgrateS can process and charge for, based on the rules we define.
But before I can tell you that story, I have to tell you this story…
Nick’s imaginary CSV factory
In the repo I’ve added a DummyCSV.csv, it’s (as you might have guessed) a CSV file.
This CSV file is like a million other CSV formats out there – We’ve got a CSV file with Start Time, End Time, Customer, Talk Time, Calling Party, Called Party, Animal (for reasons) and CallID to uniquely identify this CDR.
Protip: The Rainbow CSV VScode extension makes viewing/editing/querying CSV files in VScode much easier.
Call Start Time
Row 0
Call End Time
Row 1
Customer
Row 2
Talk Time
Row 3
Calling Party
Row 4
Called Party
Row 5
Animal
Row 6
CallID
Row 7
File Format
Next we need to feed this into CGrateS, and for that we’ll be using the Event Reporter Service.
JSON config files don’t make for riveting blog posts, but you’ve made it this far, so let’s power through.
ERS is setup in CGrateS’ JSON config file, where we’ll need to define one or more readers which are the the logic we define inside CGrateS to tell it what fields are what, where to find the files we need to import, and set all the parameters for the imports.
This means if we have a CSV file type we get from one of our suppliers with CDRs in it, we’d define a reader to parse that type of file. Likewise, if we’ve got a CSV of SMS traffic out of our SMSc, we’d need to define another reader to parse the CDRs in that format – Generally we’ll do a Reader for each file type we want to parse.
So let’s define a reader for this CSV spec we’ve just defined:
"ers": {
"enabled": true,
"readers": [
{
"id": "blog_example_csv_parser",
"enabled": true,
"run_delay": "-1",
"type": "*file_csv",
"opts": {
"csvFieldSeparator":",",
"csvLazyQuotes": true,
//csvLazyQuotes Counts the row length and if does not match this value declares an error
//-1 means to look at the first row and use that as the row length
"csvRowLength": -1
},
"source_path": "/var/spool/cgrates/blog_example_csv_parser/in",
"processed_path": "/var/spool/cgrates/blog_example_csv_parser/out",
"concurrent_requests": 1024, //How many files to process at the same time
"flags": [
"*cdrs",
"*log"
],
"tenant": "cgrates.org",
"filters": [
"*string:~*req.2:Nick", //Only process CDRs where Customer column == "Nick"
],
"fields":[]
}]}
This should hopefully be relatively simple (I’ve commented it as best I can).
The ID of the ERS object is just the name of this reader – you can name it anything you like, keeping in mind we can have multiple readers defined for different file formats we may want to read, and setting the ID just helps to differentiate them.
The run_delay of -1 means ERS will run as soon as a file is moved into the source_path directory, and the type is a CSV file – Note that’s moved not copied. We’ve got to move the file, not just copy it, as CGrateS waits for the inode notify.
In the opts section we set the specifics for the CSV we’re reading, field separator if how we’re separating the values in our CSV, and in our case, we’re using commas to delineate the fields, but if you were using a file using semicolons or another delineator, you’d adjust this.
Lastly we’ve got the paths, the source path is where we’ll need to move the files to get processed into, and the processed_path is where the processed files will end up.
For now I’ve set the flags to *log and *cdrs – By calling log we’ll make our lives a bit easier for debugging, and CDRs will send the event to the CDRs module to generate a rated CDR in CGrateS, which we could then use to bill a customer, a supplier, etc, and access via the API or exporting using Event Exporter Service.
Lastly under FilterS we’re able to define the filters that should define if we should process a row or not. You don’t know how much you need this feature until you need this feature. The filter rule I’ve included will only process lines where the Customer field in the CSV (row #2) is equal to “Nick”. You could use this to also filter only calls that have been answered, only calls to off-net, etc, etc – FilterS needs a blog post all on it’s own (and if you’re reading this in the future I may have already written one).
Alright, so far so good, we’ve just defined the metadata we need to do to read the file, but now how do we actually get down to parsing the lines in the file?
Well, that’s where the data in Fields: [] comes in.
If you’ve been following along the CgrateS in baby steps series, you’ll have rated a CDR using the API, that looked something like this:
ERS is going to use the same API to rate a CDR, calling more-or-less the same API, so we’re going to set the parameters that go into this from the CSV contents inside the fields:
"fields":[
//Type of Record (Voice)
{"tag": "ToR", "path": "*cgreq.ToR", "type": "*constant", "value": "*voice"},
//Category set to "call" to match RatingProfile_VoiceCalls from our RatingProfile
{"tag": "Category", "path": "*cgreq.Category", "type": "*constant", "value": "call"},
//RequestType is *rated as we won't be deducting from an account balance
{"tag": "RequestType", "path": "*cgreq.RequestType", "type": "*constant", "value": "*rated"},
]
That’s the static values out of the way, next up we’ll define our values we pluck from the CSV. We can get the value of each row from “~*req.ColumnNumber” where ColumnNumber is the column number starting from 0.
//Unique ID for this call - We get this from the CallID field in the CSV
{"tag": "OriginID", "path": "*cgreq.OriginID", "type": "*variable","value":"~*req.7"},
//Account is the Source of the call
{"tag": "Account", "path": "*cgreq.Account", "type": "*variable", "value": "~*req.4"},
//Destination is B Party Number - We use 'Called Party Number'
{"tag": "Destination", "path": "*cgreq.Destination", "type": "*variable", "value": "~*req.5"},
{"tag": "Subject", "path": "*cgreq.Subject", "type": "*variable", "value": "~*req.5"},
//Call Setup Time (In this case, CGrateS can already process this as a datetime object)
{"tag": "SetupTime", "path": "*cgreq.SetupTime", "type": "*variable", "value": "~*req.0"},
//Usage in seconds - We use 'Call duration'
{"tag": "Usage", "path": "*cgreq.Usage", "type": "*variable", "value": "~*req.3"},
//We can include extra columns with extra data - Like this one:
{"tag": "Animal", "path": "*cgreq.Animal", "type": "*variable", "value": "~*req.6"},
]
You’ll need to restart CGrateS after putting the config changes in, but your instance will probably fail to start as we’ll need to create the directories we specified CGrateS should monitor for incoming CSV files:
But before we can put this all into play, we’ll need to setup some rates. My previous posts have covered how to do this, so for that I’ve included a Python script to setup all the rates, which you can run once you’ve restarted CGrateS.
Alright, with that out of the way, we can test it out, move our Dummy.csv file to /var/spool/cgrates/blog_example_csv_parser/in and see what happens.
I run Ubuntu on my desktop and I mess with Kamailio a lot.
Recently I was doing some work with KEMI using Python, but installing the latest versions of Kamailio from the Debian Repos for Kamailio wasn’t working.
The following packages have unmet dependencies:
kamailio-python3-modules : Depends: libpython3.11 (>= 3.11.0) but 3.11.0~rc1-1~22.04 is to be installed
Kamailio’s Python modules expect libpython3.11 or higher, but Ubuntu 22.04 repos only contain the release candidate – not the final version:
If you’re setting up Kamailio for support for WebSocket and need to bind to TCP port 80 or TCP port 443, you may run into the issue that permission is denied to bind to these ports when you try and start the service.
S8 Home Routing is a really simple concept, the traffic goes from the SGW in the visited PLMN to the PGW in the home PLMN, so the PCRF, OCS/OFCS, IMS, IP Addresses, etc, etc, are all in the home network, and this avoids huge amounts of complexity.
But in order for this to work, the visited network MME needs to find the PGW of the home network, and with over 700 roaming networks in commercial use, each one with potentially hundreds of unique APNs each routing to a different PGW, this is a tricky proposition.
If you’ve configured your PGW peers statically on your MME, that’s fine, but it doesn’t scale very well – And if you add an MVNO who wants their own PGW for serving their APN, well you’ll be adding some complexity there to, so what to do?
Well, the answer is DNS.
By taking the APN to be served, the home PLMN and the interface type desired, with some funky DNS queries, our MME can determine which PGW should be selected for a request.
Let’s take a look, for a UE from MNC XXX MCC YYY roaming into our network, trying to access the “IMS” APN.
Our MME knows the network code of the roaming subscriber from the IMSI is MNC XXX, MCC YYY, and that the UE is requesting the IMS APN.
So our MME crafts a DNS request for the NAPTR query for ims.apn.epc.mncXXX.mccYYY.3gppnetwork.org:
Because the domain is epc.mncXXX.mccYYY.3gppnetwork.org it’s routed to the authoritative DNS server in the home network, which sends back the response:
We’ve got a few peers to pick from, so we need to filter this list of Answers to only those that are relevant to us.
First we filter by the Service tag, whihc for each listed peer shows what services that peer supports.
But since we’re looking for S8, we need to find a peer who’s “Service” tag string contains:
x-3gpp-pgw:x-s8-gtp
We’re looking for two bits of info here, the presence of x-3gpp-pgw in the Service to indicate that this peer is a PGW and x-s8-gtp to indicate that this peer supports the S8 interface.
A service string like this:
x-3gpp-pgw:x-s5-gtp
Would be excluded as it only supports S5 not S8 (Even though they are largely the same interface, S8 is used in roaming).
It’s also not uncommon to see both services indicated as supported, in which case that peer could be selected too:
x-3gpp-pgw:x-s5-gtp:x-s8-gtp
(The answers in the screenshot include :x-gp which means the PGWs advertised are also co-located with a GGSN)
So with our answers whittled down to only those that meet our needs, we next use the Order and the Preference to pick our best candidate, this is the same as regular DNS selection logic.
From our candidate, we’ve also got the Regex Replacement, which allows our original DNS request to be re-written, which allows us to point at a single peer.
In our answer, we see the original request ims.apn.epc.mncXXX.mccYYY.3gppnetwork.org is to be re-written to topon.lb1.pgw01.epc.mncXXX.mccYYY.3gppnetwork.org.
This is the FQDN of the PGW we should use.
Now we know the FQND we should use, we just do an A-Record lookup (Or AAAA record lookup if it is IPv6) for that peer we are targeting, to turn that FQDN into an IP address we can use.
And then in comes the response:
So now our MME knows the IP of the PGW, it can craft a Create Session request where the F-TEID for the S8 interface has the PGW IP set on it that we selected.
For more info on this TS 129.303 (Domain Name System Procedures) is the definitive doc, but the GSMA’s IR.88 “LTE and EPC Roaming Guidelines” provides a handy reference.
Want more telecom goodness?
I have a good old fashioned RSS feed you can subscribe to.