Stumbled across these the other day, while messing around with some values on our SMSc.
Setting the Data Coding Scheme to 16 with GSM7 encoding flags the SMS as “Flash message”, which means it pops up on the screen of the phone on top of whatever the user is doing.
Oddly while there’s plenty of info online about Flash SMS, it does not appear in the 3GPP specifications for SMS.
Turns out they still work, move over RCS and A2P, it’s all about Flash messages!
There’s no real secret to this other than to set the Data Coding Scheme to 16, which is GSM7 with Flash class set. That’s it.
Obviously to take advantage of this you’d need to be a network operator, or have access to the network you wish to deliver to. Recently more A2P providers are filtering non vanilla SMS traffic to filter out stuff like SMS OTA message or SIM specific messages, so there’s a good chance this may not work through A2P providers.
I’ve been writing a fair bit recently about the “VoLTE Mess” – It’s something that’s been around for a long time, mostly impacting greenfield players rolling out LTE only, but now the big carriers are starting to feel it as they shut off their 2G and 3G networks, so I figured a brief history was in order to understand how we got here.
Note: I use the terms 4G or LTE interchangeably
The Introduction of LTE
LTE (4G) is more “spectrally efficient” than the technologies that came before it. In simple terms, 1 “chunk” of spectrum will get you more speed (capacity) on LTE than the same size chunk of spectrum would on 2G or 3G.
So imagine it’s 2008 and you’re the CTO of a mobile network operator. Your network is congested thanks to carrying more data traffic than it was ever designed for (the first iPhone had launched the year before) and the network is struggling under the weight of all this new data traffic. You have two options here, to build more cell sites for more density (very expensive) or buy more spectrum (extremely expensive) – Both options see you going cap in hand to the finance team and asking for eye-wateringly large amounts of capital for either option.
But then the answer to your prayers arrives in the form of 3GPP’s Release 8 specification with the introduction of LTE. Now by taking some 2G or 3G spectrum, and by using it on 4G, you can get ~5x more capacity from the same spectrum. So just by changing spectrum you own from 2G or 3G to 4G, you’ve got 5x more capacity. Hallelujah!
So you go to Nortel and buy a packet core, and Alcatel and Siemens provide 4G RAN (eNodeBs) which you selectively deploy on the cell sites that are the most congested. The finance team and the board are happy and your marketing team runs amok with claims of 4G data speeds. You’ve dodged the crisis, phew.
This is the path that all established mobile operators took; throw LTE at the congested cell sites, to cheaply and easily free up capacity, and as the natural hardware replacement cycle kicked in, or cell sites reached capacity, swap out the hardware to kit that supports LTE in addition to the 2G and 3G tech.
Circuit Switched Fallback
But it’s hard to talk about the machinations of late 2000s telecom executives, without at least mentioning Hitler.
This video below from 15 years ago is pretty obscure and fairly technical, but the crux of it it is that Hitler is livid because LTE does not have a “CS Domain” aka circuit switched voice (the way 2G and 3G had handled voice calls).
It was optional to include support for voice calls in the LTE network (Voice over LTE) when you launched LTE services. So if you already had a 2G or 3G network (CS Network) you could just keep using 2G and 3G for your voice calls, while getting that sweet capacity relief.
So our hypothetical CTO, strapped for cash and data capacity, just didn’t bother to support VoLTE when they launched LTE – Doing so would have taken more time to launch, during which time the capacity problem would become worse, so “don’t worry about VoLTE for now” was the mantra.
All the operators who still had 2G and 3G networks, opted to just “Fallback” to using the 2G / 3G network for calling. This is called “Circuit Switched Fallback” aka CSFB.
Operators loved this as they got the capacity relief provided by shifting to 4G/LTE (more capacity in the network is always good) and could all rant about how their network was the fastest and had 4G first, this however was what could be described as a “Foot gun” – Something you can shoot yourself in the foot with in the future.
Operators eventually introduce VoLTE
Time ticked on an operators built out their 4G networks, and many in the past 10 years or so have launched VoLTE in their own networks.
For phones that support it, in areas with blanket 4G coverage, they can use VoLTE for all their calls.
But that’s the sticking point right there – If the phones support it.
But if the phones don’t support it, they’re roaming or making emergency calls, there is always been the safety blanket of 2G or 3G and Circuit Switched fallback to well, fall back to.
There’s no driver for operators who plan to (or are required to) operate a 2G or 3G network for the foreseeable future, to ensure a high level of VoLTE support in their devices.
For an operator today with 2G or 3G, Voice over LTE is still optional. Many operators still rely exclusively on Circuit Switched Fallback, and there are only a handful of countries that have turned off 2G and 3G and rely solely on VoLTE.
VoLTE Handset Support
For the past 16 years phone manufacturers have been making LTE capable phones.
But that does not mean they’ve been making phones that support Voice over LTE.
But it’s never been an issue up until this point, as there’s always been a circuit switched (2G/3G) network to fall back to, so the fact that these chips may not support VoLTE was not a big problem.
Many of the cheaper chipsets that power phones simply don’t support VoLTE – These chips do support LTE for data connections but rely on Circuit Switched Fallback for voice calls. This is in part due to the increased complexity, but also because some of the technologies for VoLTE (like AMR) required intellectual property deals to licence to use, so would add to the component cost to manufacture, and in the chips game, keeping down component cost is critical.
Even for chips that do support Voice over LTE, it’s “special”. Unlike calling in 2G or 3G that worked the same for every operator, phone manufacturers require a “Carrier Bundle” for each operator, containing that specific operators’ special flavor of VoLTE, that operator uses in their network.
This is because while VoLTE is standardized (Despite some claims to the contrary) a lot of “optional” bits have existed, and different operators built networks with subtle differences in the “flavor” of their Voice over LTE (IMS) stack they used. The OEMs (Phone / Chip manufacturers) had to handle these changes in the devices they made, for in order to sell their phones through that operator.
This means I can have a phone from vendor X that works with VoLTE on Network Y, but does not support VoLTE on Network Z.
Worse still, knowing which phones are supported is a bit of a guessing game.
Most operators sell phones directly to their customer base, so buying an Network Y branded phone from Vendor X, you know it’s going to support Network Y’s VoLTE settings, but if you change carriers, who knows if it’ll still support it?
When you’ve still got a Circuit Switched network it’s not the end of the world, you’ll just use CSFB and probably not realize it, until operators go to shut down 2G / 3G networks…
Navigating the Maze of VoLTE Compatibility
Here are some simple checklist you can ask your elderly family members if they ask if their phone is VoLTE compatible:
Does the underlying chipset the phone is based on support VoLTE? (you can find this out by disassembling the phone and checking the datasheets for the components from the OEMs after signing NDAs for each)
Does the underlying chipset require a “carrier bundle” of settings to have been loaded for this operator in order to support VoLTE (See Qualcomm MBM as an example)?
What version of this list am I currently on (generally set in the factory) and does it support this operator? (You can check by decapping the ICs and dumping their NVRAM and then running it through a decompiler)
Does my phones OS (Android / iOS) require a “carrier bundle” of it’s own to enable VoLTE? Is my operator in the version of the database on the phone? (See Android’s Carrier Database for example) (You can find the answer by rooting the phone and running some privileged commands to poke around the internal file system)
Does my operator / MNO support VoLTE – Does my plan / package support VoLTE? (You can easily find the answer by visiting the store and asking questions that don’t appear on the script)
If you managed to answer yes to all of the above, congratulations! You have conditional VoLTE support on your phone, although you probably don’t have a working phone anymore.
Wait, conditional VoLTE support?
That’s right folks, VoLTE will work in some scenarios with your operator!
If you plan on traveling, well your phone may support VoLTE at home, but does the phone have VoLTE roaming enabled? Many phones support VoLTE in the home network, but resort to CSFB when roaming.
If it does support VoLTE roaming, does the network you’re visiting support VoLTE roaming? Has the roaming agreement (IRA) between the operator you’re using while traveling and your home operator been updated to include VoLTE Roaming? These IRAs (AA.12 / AA.13 docs) also indicate if the network must turn off IPsec encryption for the VoLTE traffic when roaming, which is controlled by the phone anyway.
Phew, all this talk of VoLTE roaming while traveling scares me, I think I’ll stay home in the safety of the Australian bush with all these great friendly animals around a phone that supports VoLTE on my home network.
Ah – After spending some time in the Australian bush one of our many deadly animals bit me. Time to call for help! Wait, what about emergency calls over VoLTE? Again, many phones support VoLTE for normal calls, fall back to 2G or 3G for the emergency call, so if you have one of those phones (You’ll only find out if you try to make an emergency call and it fails) and try to make an emergency call in a country without 2G or 3G, you’d better find a payphone.
Sarcasm aside, there’s no dataset or compatibility matrix here – No simple way to see if your phone will work for VoLTE on a given operator, even if the underlying chip does support VoLTE.
Operators in Australia which recently shut down their 3G network, were mandated to block devices that didn’t support VoLTE for emergency calling. They did this using an Equipment Identity Register, and blocking devices based on the Type Allocation Code, but this scattergun approach just blocked non-carrier issued devices, regardless of it they supported VoLTE or VoLTE emergency calling.
Blame Game
So who’s to blame here?
There’s no one group to blame here, the industry has created a shitty cycle here:
Standards orgs for having too many “flavors” available
Operators deploying their own “Flavors” of VoLTE then mandating OEMs / Chip manufacturers comply with their “flavor”.
OEMs / Chip manufactures respond by adding “Carrier Bundles” to account for this per-operator customization
I’ve got some ideas on a way to unscramble this egg, and it’s going to take a push from the industry.
If you’re in the industry and keen to push for a fix, get in touch!
It’s time to get a long term solution to this problem, and we as an industry need to lead the change.
Oh boy this has been a pain in the backside with IMS / VoLTE devices using TCP and how they handle the underlying TCP sockets.
A mobile phone from manufacturer A, wants every SIP dialog to be in it’s own TCP session, while a phone from manufacturer B wants a unique TCP session per transaction, while manufacturer C thinks that every SIP message should reuse the same transaction.
So an MT call to manufacturer A, who wants every SIP dialog in it’s own transaction would look something like this:
PCSCF:44738 -> UE:5060; TCP SYN UE:5060 -> PCSCF:44738; TCP SYN/ACK PCSCF:44738 -> UE:5060; TCP ACK --- TCP connection is now open to UE from P-CSCF--- --- Start of new SIP Transaction 1 & Dialog --- PCSCF:44738 -> UE:5060; TCP PSH - SIP INVITE.... UE:5060 -> PCSCF:44738; TCP ACK
--- Start of SIP Transaction 2 --- PCSCF:44738 -> UE:5060; TCP PSH - SIP BYE.... UE:5060 -> PCSCF:44738; TCP ACK, PSH - SIP 200.... --- End of SIP Transaction 2 & SIP Dialog --- PCSCF:44738 -> UE:5060; TCP FIN UE:5060 -> PCSCF:44738; TCP ACK --- End of TCP Connection ---
Where UE:5060 – is the IP & port of the UE, as advertised in the Contact: header, while PCSCF:44738 is the PCSCF IP and a random TCP port used for this connection.
But for manufacturer B, who wants a unique TCP session per transaction, they want it to look like this:
PCSCF:44738 -> UE:5060; TCP SYN UE:5060 -> PCSCF:44738; TCP SYN/ACK PCSCF:44738 -> UE:5060; TCP ACK --- TCP connection is now open to UE from P-CSCF--- --- Start of new SIP Transaction 1 & Dialog --- PCSCF:44738 -> UE:5060; TCP PSH - SIP INVITE.... UE:5060 -> PCSCF:44738; TCP ACK
--- Start of TCP Session 2 ---- PCSCF:32627 -> UE:5060; TCP SYN UE:5060 -> PCSCF:32627; TCP SYN/ACK PCSCF:32627 -> UE:5060; TCP ACK --- Start of SIP Transaction 2 --- PCSCF:32627 -> UE:5060; TCP PSH - SIP BYE.... UE:5060 -> PCSCF:32627; TCP ACK, PSH - SIP 200.... --- End of SIP Transaction 2 & SIP Dialog --- PCSCF:32627 -> UE:5060; TCP FIN UE:5060 -> PCSCF:32627; TCP ACK --- End of TCP Connection 2 ---
And then manufacturer C wants just the one TCP session to be used for everything, so they open the TCP connection when they register, and that’s all we use for everything.
Is there any logic to this? Nope, seems to be tied to the underlying chipset (Qualcomm vs Mediatek vs Unisoc) and the SIP stack used (Qualcomm, MTK, Unisoc, Samsung, Apple).
We’ve profiled devices into one of 3 behaviors, and then we tag them based on user agent as to what “persona” they demand from the network.
I can’t believe I’m still talking about VoLTE / IMS handset support and it’s almost 2025…. For context IMS was “standardized” 17 years ago.
This is the next post in my series on SS7, and today we’re taking a look at SCCP the Signalling Connection Control Part (SCCP).
High Level
Global Title uses the routing features from SCCP, which is another layer on top of MTP3.
SCCP allows us to route on more than just point code, instead we can route based on two new fields, Subsystem Number and Global Title.
Subsystem Number is the type of system we are looking to reach, ie an HLR, MSC, CAMEL Gateway, etc.
The Global Title generally looks like an E.164 formatted phone number, and often it is just that.
Somewhere along the chain (typically at the end of it) an STP somewhere needs to perform Global Title Translation to analyse the SCCP header (Subsystem Number, Point Code & Global Title) and finally turn that into a single point code to route the MTP3 message to.
The advantage of this is we are no longer just limited to routing messages based on Point Code.
This is how the international SS7 Network used for roaming is structured and addressed – All using Global Title rather than Point Codes.
The need for SCCP
For starters, after all this talk of MTP3 and Point Codes, why the need to add SCCP?
Let’s go back in time and look at the motivators…
1. Address space is finite
Point codes are great, and when we’ve spoken about them before, I’ve compared them to IPv4 address, but rather than ranging from 0.0.0.0 to 255.255.255.255 (32 bits on IPv4) international signaling point codes range from 0.0.0 to 7.255.7 (14 bits).
The problem with IPv4’s 32 bit addresses is they run out. The problem with the ITU International Signaling Point Codes is that they too, are a limited resource with only 16,383 possible ISPCs.
~700 operators worldwide each with ~100 network elements would be 70k point codes to address them all – That’s not going to fit into our 16k possible Point Codes.
Global Title fixes this, because we’re able to use E.164 phone number ranges (which are plentiful) for addressing, we’re still not at IPv6 levels of address space, but pretty hefty.
2. Service Discovery by Subsystem
Now imagine you’re a VLR looking to find an HLR. The VLR and the HLR are both connected to an STP, but how does the VLR know where to reach the HLR?
One option would be to statically set every route for the Point Code of every HLR into every possible VLR and visa-versa, but that gets messy fast.
What if the VLR could just send a request to the STP and indicate that the request needs to be routed to any HLR, and the STP takes care of finding a SS7 node capable of handling the request, much a Diameter Routing Agent routes based on Application ID.
SCCP’s “Subsystem Number” routing can handle this as we can route based on SSN.
3. Service Discovery by MSISDN
Having an SMS destined to a given MSISDN requires the SMSc to know where to route it.
Likewise an MSC wanting to call a given number.
There’s a lot of MSISDN ranges. Like a lot. Like every phone mobile number.
Having every a table on every SSP/SCP in the network know where every MSISDN range is in the world and what point code to go through to reach it is not practical.
Instead, being able to have the SCP/SSPs (like our MSC or SMSc) send all off-net traffic to an STP frees us the individual SCP/SSPs from this role; they just forward it to their connected STP.
Our STP can analyse the destination MSISDN and make these routing decisions for us, using Global Title Translation based on rules in the Global Title Table on the STP.
For example by adding each of the domestic / national MSISDN ranges/prefixes into the Global Title Table on the STP (along with the corresponding point code to route each one to), the STP can look at the destination MSISDN in the message and forward to the STP for the correct operator.
Likewise a route can match anything where the Global Title address is outside of the local country and send it to an international signaling provider.
Global title takes care of this as we can route based on a phone number.
4. Tokenistic Security
By “Hiding” network elements behind Global Titles, you don’t expose as much information about your internal network, and the only way people can “find” your network elements would be scanning through all the possible addresses in your (publicly advertised) Global Title range (wardialing is back baby!).
But the phrases “Security” and “SS7” don’t really belong together…
The SCCP Header
The SCCP header has a Called Party and a Calling Party, and this is where the magic happens.
These can be made up for any number of 3 parts:
Global Title Address
Subsystem Number
Point Code
We can route on any combination of these.
To indicate we’re using SCCP, we set the Signaling Indicator bit in the M3UA / MTP3 message to SCCP:
Great, now we can look at our SCCP header.
It looks like there’s a lot going on, but we can see the calling and called party (888888888 is called by 9999999999) with the Subsystem number set (888888888 is called for subsystem HLR, from 999999999 which is a VLR).
The closest TCP/IP analogy I can think of here is that of port numbers, there’s still an IP (Point code) but the port number allows us to specify multiple applications that run at a higher layer. This analogy falls down when we consider that the Point Code is generally set to that of your STP, not the final STP.
For this to work, we’ve got to have at least one Signaling Transfer Point in the flow, where we send the request to.
Somewhere (generally at the end of the chain of STPs), an STP is going to perform Global Title Translation.
What does this look like? Well let’s have a look at my GT table for the example above, in my lab network, I’ve got two nodes attached (via M3UA but could equally be on MTP3 links), my test MAP client where I’m originating this traffic, and an SMS Firewall, I can see they’re both up here:
Now knowing this I need to setup my SCCP routing for Global Title. In the screenshot above, the Called Party was 888888888 with Subsystem Number 7. Inside the SCCP request, there’s a few other fields, the Translation Type we have set to 0, Global Title Indicator is 4 (route on Global Title), while Numbering Plan Indicator is 1 (ISDN) and Nature of Address Indicator is 4 (International).
So on my Cisco ITP I define a GTT Selector to target traffic with these values, Translation Type is 0, Global Title Indicator is 4, Number Plan is 1 and the Nature of Address Indicator is 4.
So we’d define a Global Title Translation selector like the one below to match this traffic:
But that’s only matching the group of traffic, it’s not going to match based on the actual SCCP Called Party. So now I need to define a translation for each Global Title address (Called /Calling party) or prefix I want to route, I’ve setup anything starting with 888 to route to the `SMSFirewall` ASP endpoint.
I could stop here and my request addressed to 888888888 would make it to the SMSFirewall ASP, but the response never would, like in all SS7 routing, we need to define the return route translation too, which is what I’ve done for 999999 to route to the TestClient.
Lastly I’ve added a wildcard route, this means if this STP doesn’t know how to resolve a GT address matching the rules in the top line, it’ll forward the request to the STP at point code 1.2.3 – This is how you’d do your connection to an IPX / Signaling exchange.
Debugging this can be a massive pain in the backside, but if you enable logging you can see when GT rules are not matched, like in the example below.
If your network is quiet enough, it’s sometimes easier to just make your rules based on what you observe failing to route.
So with those routes in place, when we send a request with the Global Title called party starting with 8888888 it’s routed to M3UA ASP SMSFirewall, which handles the request, and then sends the response back to the MAPClient M3UA ASP.
The Data Coding Scheme (DCS or TP-DCS) header in an SMS body indicates what encoding is used in that message.
It means if we’re using UCS-2 (UTF16) special characters like Emojis etc, in in our message, the phone knows to decode the data in the message body using UTF, because the Data Coding Scheme (DCS) header indicates the contents are encoded in UTF.
Likewise, if we’re not using any fancy characters in our message and the message is encoded as plain old GSM7, we set set the DCS to 0 to indicate this is using GSM7.
From my experience, I’d always assumed that DCS0 (Default) == GSM7, but today I learned, that’s not always the case. Some SMSc entities treat DCS0 as Latin.
Let me explain why this is stupid and why I wasted a lot of time on this.
We can indicate that a message is encoded as Latin by setting the DCS to 0x03:
We cannot indicate that the message is encoded as GSM7 through anything other than the default alphabet (DCS 0).
Latin has it’s own encoding flag, if I wanted the message treated as Latin, I’d indicate the message encoding is Latin in the DCS bit!
I spent a bunch of time trying to work out why a customer was having issues getting messages to subscribers on another operator, and it turned out the other operator treats messages we send to them on SMPP with DCS0 as Latin encoding, and then cracks the sads when trying to deliver it.
The above diff shows the message we send (Right), and the message they dry to deliver (left).
One of the new features of 5GC is the introduction of Service Based Interfaces (SBI) which is part of 5GC’s Service Based Architecture (SBA).
Let’s start with the description from the specs:
3GPP TS 23.501 [3] defines the 5G System Architecture as a Service Based Architecture, i.e. a system architecture in which the system functionality is achieved by a set of NFs providing services to other authorized NFs to access their services.
3GPP TS 29.500 – 4.1 NF Services
For that we have two key concepts, service discovery, and service consumption
Services Consumer / Producers
That’s some nice words, but let’s break down what this actually means, for starters, let’s talk about services.
In previous generations of core network we had interfaces instead of services. Interfaces were the reference point between two network elements, describing how the two would talk. The interfaces were the protocols the two interfaces used to communicate.
For example, in EPC / LTE S6a is the interface between the MME and the HSS, S5 is the interface between the S-GW and P-GW. You could lookup the 3GPP spec for each interface to understand exactly how it works, or decode it in Wireshark to see it in action.
5GC moves from interfaces to services. Interfaces are strictly between two network elements, the S6a interface is only used between the MME and the HSS, while a service is designed to be reusable.
This means the Service Based Interface N5g-eir can be used by the AMF, but it could equally be used by anyone else who wants access to that information.
3GPP defines the service in the form of service producer (The EIR produces the N5g-eir service) and the service consumer (The client connecting to the N5g-eir service), but doens’t restrict which network elements can
This gets away from the soup of interfaces available, and instead just defines the services being offered, rather than locking the
“service consumers” (which can be thought of clients in a client/server model) can discover “service producers” (like servers in a client/server model).
Our AMF, which acts as a “service consumer” consuming services from the UDM/UDR and SMF.
Service Discovery – Automated Discovery of NF Services
Service-Based Architecture enables 5G Core Network Function service discovery.
In simple terms, this means rather that your MME being told about your SGW, the nodes all talk to a “Network Repository Function” that returns a list of available nodes.
The other side of the SCTP connection didn’t like my SCTP parameters.
My SCTP INIT looked like this:
By default, Linux includes support for the ECN and SupportedAddressTypes parameters in the SCTP INIT.
But the other side did not like this, it sent back an ERROR stating that:
Okay, apparently it doesn’t like the fact that we support Forward Transmission Sequence Numbers – So how to turn it off?
I’m using P1Sec’s PySCTP library to interact with the SCTP stack, and I couldn’t find any referneces to this in the code, but then I rememberd that PySCTP is just a wrapper for libsctp so I should be able to control it from there.
Inside `/proc/sys/net/sctp` we can see all the parameters we can control.
To disable the Forward TSN I need to disable the feature that controls it – Forward Transmission Sequence numbers are introduced in the Partial Reliability Extension (RFC 3758). So it was just a matter of disabling that with:
If you’re working with the larger SIM vendors, there’s a good chance they key material they send you won’t actually contain the raw Ki values for each card – If it fell into the wrong hands you’d be in big trouble.
Instead, what is more likely is that the SIM vendor shares the Ki generated when mixed with a transport key – So what you receive is not the plaintext version of the Ki data, but rather a ciphered version of it.
But as long as you and the SIM vendor have agreed on the ciphering to use, an the secret to protect it with beforehand, you can read the data as needed.
This is a tricky topic to broach, as transport key implementation, is not covered by the 3GPP, instead it’s a quasi-standard, that is commonly used by SIM vendors and HSS vendors alike – the A4 / K4 Transport Encryption Algorithm.
It’s made up of a few components:
K2 is our plaintext key data (Ki or OP)
K4 is the secret key used to cipher the Ki value.
K7 is the algorithm used (Usually AES128 or AES256).
It’s important when defining your electrical profile and the reuqired parameters, to make sure the operator, HSS vendor and SIM vendor are all on the same page regarding if transport keys will be used, what the cipher used will be, and the keys for each batch of SIMs.
Here’s an example from a Huawei HSS with SIMs from G&D:
We’re using AES128, and any SIMs produced by G&D for this batch will use that transport key (transport key ID 1 in the HSS), so when adding new SIMs we’ll need to specify what transport key to use.
In our last post we covered the basics of NB-IoT Non-IP Data Deliver (NIDD), and if that acronym soup wasn’t enough for you, we’re going to take a deep dive into the flows for attaching, sending, receiving and closing a NIDD session.
The attach for NIDD is very similar to the standard attach for wideband LTE, except the MME establishes a connection on the T6a Diameter interface toward the SCEF, to indicate the sub is online and available.
The NIDD Attach
The SCEF is now able to send/receive NIDD traffic from the subscriber on the T6a interface, but in reality developers don’t / won’t interact with Diameter, so the SCEF exposes the T8 API that developers can interact with to access an abstraction layer to interact with the SCEF, and then through onto the UE.
If you’re wondering what the status of Open Source SCEF implementations are, then you may have already guessed we’re working on one! PyHSS should have support for NB-IoT SCEF features in the future.
NB-IoT provides support for Non-IP Data Delivery (NIDD) over 3GPP Networks, but to handle this, some new network elements are introduced, in a home network scenario that’s the SCEF and the SCF/AS.
On the 3GPP side the SCEF it communicates to the MME via the T6a Interface, which is based upon Diameter.
On the side towards our IoT Service Consumers (in the standards referred to as “SCS/AS” or “Service Capabilities Server Application Servers” (catchy names as always), via the RESTful HTTP based T8 interface.
The start of the S1 Attach procedure is very similar to a regular S1 attach.
The initial S1 PDU Connectivity Request indicates in the ESM Message Container that the PDN Type is Non IP.
Other than that, the initial attach procedure looks very similar to the regular S1 attach procedure.
On the S6a interface the Update Location Request from the MME to the HSS indicates that this is an EUTRAN-NB-IoT Radio Access Type.
And the Update Location Answer APN Configuration contains some additional AVPs on the APN to indicate that the APN supports Non-IP-PDN-Type and that the SCEF is used for Data Delivery.
The SCEF-ID (Diameter Host) and SCEF-Realm (Diameter realm) to serve this user is also specified in the APN Configuration in the Update Location Answer.
This is how our MME determines where to send the T6a traffic.
With this, the MME sends a Connection Management Request (CMR) towards the SCEF specified in the SCEF-ID returned by the HSS.
The Connection Management Request / Response
The MME now sends a Diameter T6a Connection Management Request to the SCEF in the Update Location Answer,
In it we have a Session-Id, which continues for the life of our NIDD session, the service-selection which contains our APN (In our case “non-ip”) and the User-Identifier AVP which contains the MSISDN and/or IMSI of the subscriber.
To accept this, the SCEF sends back a Connection-Management-Answer to confirm we’re all good to go:
At this point our SCEF now knows about the subscriber who’s just attached to our network, and correlates it with the APN and the session-ID.
On the S1 side the connection is confirmed and we’re ready to roll.
Mobile Originated Data Request / Response
When the UE wants to send NIDD it’s carried in NAS messaging, so we see an Uplink NAS transport from the UE and inside the NAS payload itself is our HEX data.
Our MME grabs this out and sends it in the form of of a Mobile-Originated-Data-Request (MODR) to the SCEF, along with the same Session-ID that was setup earlier:
At this stage our Non-IP Data is exposed over the T8 RESTful API, which we won’t cover in this post.
Hello Nick, thank you for the article. What is the use of the OPc key to be derived from OP key ? Why can’t it just be a random key like Ki ?
It’s a super good question, and something I see a lot of operators get “wrong” from a security best practices perspective.
Refresher on OP vs OPc Keys
The “OP Key” is the “operator” key, and was (historically) common for an operator.
This meant all SIMs in the network had a common OP Key, and each SIM had a unique Ki/K key.
The SIM knew both, and the HSS only needed to know what the Ki was for the SIM, as they shared a common OP Key (Generally you associate an index which translates to the OP Key for that batch of SIMs but you get the idea).
But having common key material is probably not the best idea – I’m sure there was probably some reason why using a common key across all the SIMs seemed like a good option, and the K / Ki key has always been unique, so there was one unique key per SIM, but previously, OP was common.
Over time, the issues with this became clear, so the OPc key was introduced. OPc is derived from mushing the K & OP key together. This means we don’t need to expose / store the original OP key in the SIM or the HSS just the derived OPc key output.
This adds additional security, if the Ki for a SIM were to be exposed along with the OP for that operator, that’s half the entropy lost. Whereas by storing the Ki and OPc you limit the blast radius if say a single SIMs data was exposed, to only the data for that particular SIM.
This is how most operators achieve this today; there is still a common OP Key, locked away in a vault alongside the recipe for Coca-cola and the moon landing set.
But his OP Key is no longer written to the SIMs or stored in the HSS.
Instead, during the personalization process (The bit in manufacturing where SIMs get the unique data written to them (The IMSI & keys)) a derived OPc key is written to the card itself, and to the output files the operator then loads into their HSS/HLR/AuC.
This is not my preferred method for handling key material however, today we get our SIM manufacturers to randomize the OP key for every card and then derive an OPc from that.
This means we have two unique keys for each SIM, and even if the Ki and OP were to become exposed for a SIM, there is nothing common between that SIM, and the other SIMs in the network.
Do we want our Ki to leak? No. Do we want an OP Key to leak? No. But if we’ve got unique keys for everything we minimize the blast radius if something were to happen – Just minimizes the risk.
S8 Home Routing is a really simple concept, the traffic goes from the SGW in the visited PLMN to the PGW in the home PLMN, so the PCRF, OCS/OFCS, IMS, IP Addresses, etc, etc, are all in the home network, and this avoids huge amounts of complexity.
But in order for this to work, the visited network MME needs to find the PGW of the home network, and with over 700 roaming networks in commercial use, each one with potentially hundreds of unique APNs each routing to a different PGW, this is a tricky proposition.
If you’ve configured your PGW peers statically on your MME, that’s fine, but it doesn’t scale very well – And if you add an MVNO who wants their own PGW for serving their APN, well you’ll be adding some complexity there to, so what to do?
Well, the answer is DNS.
By taking the APN to be served, the home PLMN and the interface type desired, with some funky DNS queries, our MME can determine which PGW should be selected for a request.
Let’s take a look, for a UE from MNC XXX MCC YYY roaming into our network, trying to access the “IMS” APN.
Our MME knows the network code of the roaming subscriber from the IMSI is MNC XXX, MCC YYY, and that the UE is requesting the IMS APN.
So our MME crafts a DNS request for the NAPTR query for ims.apn.epc.mncXXX.mccYYY.3gppnetwork.org:
Because the domain is epc.mncXXX.mccYYY.3gppnetwork.org it’s routed to the authoritative DNS server in the home network, which sends back the response:
We’ve got a few peers to pick from, so we need to filter this list of Answers to only those that are relevant to us.
First we filter by the Service tag, whihc for each listed peer shows what services that peer supports.
But since we’re looking for S8, we need to find a peer who’s “Service” tag string contains:
x-3gpp-pgw:x-s8-gtp
We’re looking for two bits of info here, the presence of x-3gpp-pgw in the Service to indicate that this peer is a PGW and x-s8-gtp to indicate that this peer supports the S8 interface.
A service string like this:
x-3gpp-pgw:x-s5-gtp
Would be excluded as it only supports S5 not S8 (Even though they are largely the same interface, S8 is used in roaming).
It’s also not uncommon to see both services indicated as supported, in which case that peer could be selected too:
x-3gpp-pgw:x-s5-gtp:x-s8-gtp
(The answers in the screenshot include :x-gp which means the PGWs advertised are also co-located with a GGSN)
So with our answers whittled down to only those that meet our needs, we next use the Order and the Preference to pick our best candidate, this is the same as regular DNS selection logic.
From our candidate, we’ve also got the Regex Replacement, which allows our original DNS request to be re-written, which allows us to point at a single peer.
In our answer, we see the original request ims.apn.epc.mncXXX.mccYYY.3gppnetwork.org is to be re-written to topon.lb1.pgw01.epc.mncXXX.mccYYY.3gppnetwork.org.
This is the FQDN of the PGW we should use.
Now we know the FQND we should use, we just do an A-Record lookup (Or AAAA record lookup if it is IPv6) for that peer we are targeting, to turn that FQDN into an IP address we can use.
And then in comes the response:
So now our MME knows the IP of the PGW, it can craft a Create Session request where the F-TEID for the S8 interface has the PGW IP set on it that we selected.
For more info on this TS 129.303 (Domain Name System Procedures) is the definitive doc, but the GSMA’s IR.88 “LTE and EPC Roaming Guidelines” provides a handy reference.
SGs-AP which is used for CSFB & SMS doesn’t span network borders (you can’t roam with SGs-AP), and with SMSoIP out of the question, that gave us the option of MAP or Diameter, so we picked Diameter.
This introduces the S6c and SGd Diameter interfaces, in the diagrams below Orange is the Home Network (HPMN) and the Green is the Visited Network (VPMN).
The S6c interface is used between the SMSc and the HSS, in order to retrieve the routing information. This like the SRI-for-SM in MAP.
The SGd interface is used between the MME serving the UE and the SMSc, and is used for actual delivery of the MO/MT messages.
I haven’t shown the Diameter Routing Agents in these diagrams, but in reality there would be a DRA on the VPLMN and a DRA on the HPMN, and probably a DRA in the IPX between them too.
The Attach
The attach looks like a regular roaming attach, the MME in the Visited PMN sends an Update Location Request to the HSS, so the HSS knows the MME that is serving the subscriber.
The Mobile Terminated SMS Flow
Now we introduce the S6c interface and the SGd interfaces.
When the Home SMSc has a message to send to the subscriber (Mobile Terminated SMS) it runs a the Send-Routing-Info-for-SM-Request (SRR) dialog to the HSS.
The Send-Routing-Info-for-SM-Answer (SRA) back from the HSS contains the info on the MME Diameter Host name and Diameter Realm serving the subscriber.
With this info, we can now craft a Diameter Request that will get sent to the MME serving the subscriber, containing the SMS PDU to send to the UE.
We make sure it’s sent to the correct MME by setting the Destination-Host and Destination-Realm in the Diameter request.
Here’s how the request looks from the SMSc towards our DRA:
As you can see the Destination Realm and Destination-Host is set, as is the User-Name set to the IMSI of the UE we want to send the message to.
And down the bottom you can see the SMS-TPDU, the same as it’s been all the way back since GSM days.
The Mobile Originated SMS Flow
The Mobile Originated flow is even simpler, because we don’t need to look up where to route it to.
The MME receives the MO SMS from the UE, and shoves it into a Diameter message with Application ID set to SGd and Destination-Realm set to the HPMN Realm.
When the message reaches the DRA in the HPMN it forwards the request to an SMSc and then the Home SMSc has the message ready to roll.
To make more Money (This post, congratulations, you’re reading it!)
Because they have to (Regulatory compliance, insurance, taxes, etc) – That’s the next post
So let’s look at SA in this context.
5G-SA can drive new revenue streams
We (as an industry) suck at this.
Last year on the Telecoms.com podcast, Scott Bicheno made the point that if operators took all the money they’d gambled (and lost) on trying to play in the sports rights, involvement in media companies, building their own streaming apps, attempts at bundling other utilities, digital identity, etc, and just left the cash in the bank and just operated the network, they’d be better off.
Uber, Spotify, “OTTs”, etc, utilize MNOs to enable their services, but operators don’t see this extra revenue. While some operators may talk of “fair share” the truth is, these companies add value to our product (connectivity) which as an industry, we’ve failed to add ourselves.
If the Metaverse does turn out to be a cash cow, it is unlikely the telecommunications industry will be the ones milking it.
Claim: Customers are willing to pay more for 5G-SA
This myth seems to be fairly persistent, but with minimal data to support this claim.
While BSS vendors talk about “5G Monetization”, the truth is, people use their MNO to provide them connectivity. If the coverage is adequate, and the speed enough to do what they need to do, few would be willing to pay any additional cash each month to see higher numbers on a speedtest result (enabled by 5G-NSA) and even fewer would pay extra cash for, well, whatever those features only enabled by 5G-Standalone are?
With most consumers now also holding onto their mobile devices for longer periods of time, and with interest rates reining in consumer spending across the board, we are seeing the rise of a more cost conscious consumer than ever before. If we want to see higher ARPUs, we need to give the consumer a compelling reason to care and spend their cash, beyond a speed test result.
We talk a little about APIs lower down in the post.
Claim: Users want Ultra-Low Latency / High Reliability Comms that only 5G-SA delivers
Wanting to offer a product to the market, is not the same as the market wanting a product to consume.
Telecom operators want customers to want these services, but customer take up rates tell a different story. For a product like this to be viable, it must have a wide enough addressable market to justify the investment.
Reliability
The URLCC standards focus on preventing packet loss, but the world has moved on from needing zero packet loss.
The telecom industry has a habit of deciding what customers want without actually listening. When a customer talks about wanting “reliable” comms, they aren’t saying they want zero packet loss, but rather fewer dropouts or service flaps. For us to give the customer what they are actually asking for involves us expanding RAN footprint and adding transmission diversity, not 5G-SA.
The “protocols of the internet” (TCP/IP) have been around for more than 50 years now.
These protocols have always flowed over transport links with varied reliability and levels of packet loss.
Thanks to these error correction and retransmission techniques built into these protocols, a lost packet will not interrupt the stream. If your nuclear command and control network were carried over TCP/IP over the public internet (please don’t do this), a missing packet won’t lead to worldwide annihilation, but rather the sender will see the receiver never acknowledged the receipt of the packet at the other end, and resend it, end of.
If you walk into a hospital today, you’ll find patient monitoring devices, tracking the vital signs for patients and alerting hospital staff if a patient’s vital signs change. It is hard to think of more important services for reliability than this.
And yet they use WiFi, and have done for a long time, if a packet is lost on WiFi (as happens regularly) it’s just retransmitted and the end user never knows.
Autonomous cars are unlikely to ever rely on a 5G connection to operate, for the simple reason that coverage will never be 100%. If your car stops because you’re in a not-spot, you won’t be a happy customer. While plenty of cars have cellular modems in them, that are used to upload telemetry data back to the manufacturer, but not to drive the car.
One example of wireless controlled vehicles in the wild is autonomous haul trucks in mines. Historically, these have used WiFi for their comms. Mine sites are often a good fit for Private LTE, but there’s nothing inherent in the 5G Standalone standard that means it’s the only tool for the job here.
Slicing
Slicing is available in LTE (4G), with an architecture designed to allow access to others. It failed to gain traction, but is in networks today.
The RAN a piece of the latency puzzle here, but it is just one piece of the puzzle.
If we look at the flow a packet takes from the user’s device to the server they want to talk to we’ve got:
Time it takes the UE to craft the packet
Time it takes for the packet to be transmitted over the air to the base station
Time it takes for the packet to get through the RAN transmission network to the core
Time it takes the packet to traverse the packet core
Time it takes for the packet to get out to transit/peering
Time it takes to get the packet from the edge of the operators network to the edge of the network hosting the server
Time it takes the packet through the network the server is on
Time it takes the server to process the request
The “low latency” bit of the 5G puzzle only involves the two elements in bold.
If you’ve got to get from point A to point B along a series of roads, and the speed limit on two of the roads you traverse (short sections already) is increased. The overall travel time is not drastically reduced.
I’m lucky, I have access to a well kitted out lab which allows me to put all of these latency figures to the test and provide side by side metrics. If this is of interest to anyone, let me know. Otherwise in the meantime you’ll just have to accept some conjecture and opinion.
You could rebut this talking about Edge Compute, and having the datacenter at the base of the tower, but for a number of fairly well documented reasons, I think this is unlikely to attract widespread deployment in established carrier networks, and Intel’s recent yearly earning specifically called this out.
Claim: Customers want APIs and these needs 5G SA
Companies like Twilio have made it easy to interact with the carrier network via their APIs, but yet again, it’s these companies producing the additional value on a service operated by the MNOs.
My coffee machine does not have an API, and I’m OK with this because I don’t have a want or need to interact with it programatically.
By far, the most common APIs used by businesses involving telco markets are APIs to enable sending an SMS to a user.
These have been around for a long time, and the A2P market is pretty well established, and the good news is, operators already get a chunk of this pie, by charging for the SMS.
Imagine a company that makes medical booking software. They’re a tech company, so they want their stack to work anywhere in the world, and they want to be able to send reminder SMS to end users.
They could get an account manager with each of the telcos in each of the markets they work in, onboard and integrate the arcane complexities of each operators wholesale SMS system, or they could use Twilio or a similar service, which gives them global reach.
Often the cost of services like Twilio are cheaper than working directly with the carriers in each market, and even if it is marginally more expensive, the cost savings by not having to deal with dozens of carriers or integrate into dozens of systems, far outweighs this.
While it’s a great idea, in the context of 5G Standalone and APIs, it’s worth noting that none of the use cases in OpenGateway require 5G Standalone (Except possibly Edge discovery, but it is debatable).
Critically, from a developer experience perspective:
I can sign up to services like Twilio without a credit card, and start using the service right away, with examples in my programming language of choice, the developer user experience is fantastic.
Jump on the OpenGateway website today and see if you can even find a way to sign up to use the service?
Claim: Fixed Wireless works best with 5G-SA
Of all the touted use cases and applications for 5G, Fixed Wireless (FWA) has been the most successful.
The great thing about FWA on Cellular networks is you can use the same infrastructure you use for your mobile customers, and then sell excess capacity in the network to deliver Fixed Wireless Access services, better utilizing an asset (great!).
But again, this does not require Standalone 5G. If you deploy your FWA network using 5G SA, then you won’t be able to sweat that same asset for both mobile subscribers and FWA subscribers.
Today at least, very few handsets short of this generation of flagship phones, supports 5G SA. Even the phones sold as supporting 5G over the past few years, are almost all only supporting 5G-NSA, so if you rolled out your FWA network as Standalone, you can’t better utilize the asset by sharing with your existing LTE/5G-NSA customers.
Claim: The Killer App is coming for 5G and it needs 5G SA
This space is reserved for the killer app that requires 5G Standalone.
Whenever that comes?
Anyone?
I’m not paying to build a marina berth for my mega yacht, mostly because I don’t have one. Ditto this.
Could you explain to everyone on an investor call that you’re investing in something where the vessel of the payoff isn’t even known to exist? Telecom is “blue chip”, hardly speculative.
The Future for Revenue Growth?
Maybe there isn’t one.
I know it’s an unthinkable thought for a lot of operators, but let’s look at it rationally; in the developed world, everyone who wants a mobile service already has one.
This leaves operators with two options; gaining market share from their competitors and selling more/higher priced services to existing customers.
You don’t steal away customers from other operators by offering a higher priced product, and with reduced consumer spending people aren’t queuing up to spend more each month.
But there is a silver lining, if you can’t grow revenues, you can still shrink expenditure, which in the end still gets the same result at the end of the quarter – More cash.
Simplify your operations, focus on what you do really well (mobile services), the whole 80/20 rule, get better at self service, all that guff.
There’s no shortage of pain points for consumers telecom operators could address, to make the customer experience better, but few that include the word Slicing.
No one spends marketing dollars talking about the problems with a tech and vendors aren’t out there promoting sweating existing assets. But understanding your options as an operator is more important now than ever before.
Sidebar; This post got really long, so I’m splitting it into 3…
We’re often asked to help define a a 5G strategy for operators; while every case is different, there’s a lot of vendors pushing MNOs to move towards 5G standalone or 5G-SA.
I’m always a fan of playing “devil’s advocate“, and with so many articles and press releases singing the praises of standalone 5G/5G-SA, so as a counter in this post, I’ll be making the case against the narratives presented to operators by vendors that the “right” way to do 5G is to introduce 5G Standalone, that they should all be “upgrading” to Standalone 5G.
With Mobile World Congress around the corner, now seems like a good time to put forward the argument against introducing 5G Standalone, rebutting some common claims about 5G Standalone operators will be told. We’ll counterpoint these arguments and I’ll put forward the case for not jumping onto the 5G-SA bandwagon – just yet.
On a personal note, I do like 5G SA, it has some real advantages and some cool features, which are well documented, including on this blog. I’m not looking to beat up on any vendors, marketing hype or events, but just to provide the “other side” of the equation that operators should consider when making decisions and may not be aware of otherwise. It’s also all opinion of course (cited where possible), but if you’re going to build your network based on a blog post (even one as good as this) you should probably reconsider your life choices.
Some Arcane Detail: 5G Non-Standalone (NSA) vs Standalone (SA)
5G NSA (Non Standalone) uses LTE (4G) with an additional layer “bolted on” that uses 5G on the radio interface to provide “5G” speeds to users, while reusing the existing LTE (Evolved Packet Core) core and VoLTE for voice / SMS.
From an operator perspective there is almost no change required in the network to support NSA 5G, other than in the RAN, and almost all the 5G networks in commercial use today use 5G NSA.
5G NSA is great, it gives the user 5G speeds for users with phones that support it, with no change to the rest of the network needed.
Standalone 5G on the other hand requires an a completely new core network with all the trimmings.
While it is possible to handover / interwork with LTE/4G (Inter-RAT Handovers), this is like 3G/4G interworking, where each has a different core network. Introducing 5G standalone touches every element of the network, you need new nodes supporting the new standards for charging, policy, user plane, IMS, etc.
Scope
There’s an old adage that businesses spend money for one of three reasons:
To Save Money (Which we’ll cover in this post)
To make more Money (Covered next – Will link when published)
Because they have to (Regulatory compliance, insurance, taxes, etc)
Let’s look at 5G Standalone in each of these contexts:
5G Cost Savings – Counterpoint: The cost-benefit doesn’t stack up
As an operator with an existing deployed 4G LTE network, deploying a new 5G standalone network will not save you money.
From an capital perspective this is pretty obvious, you’re going to need to invest in a new RAN and a new core to support this, but what about from an opex perspective?
Claim: 5G RAN is more efficient than 4G (LTE) RAN
Spectrum is both finite and expensive, so MNOs must find the most efficient way to use that spectrum, to squeeze the most possible value out of it.
In rough numbers, we can say we get 5x the spectral efficiency by moving from 3G to 4G. This means we can carry 5.2x more with the same spectrum on 4G than we can on 3G – A very compelling reason to upgrade.
The like-for-like spectral efficiency of 5G is not significantly greater than that of LTE.
In numbers the same 5Mhz of spectrum we refarmed from UMTS (3G) to 4G (LTE) provided a 5x gain in efficiency to deliver 75Mbps on LTE. The same configuration refarmed to 5G-NR would provide 80Mbps.
Refarming spectrum from 4G (LTE) to 5G (NR) only provides a 6% increase in spectral efficiency.
While 6% is not nothing, if refarmed to a 5G standalone network, the spectrum can no longer be used by LTE only devices (Unless Dynamic Spectrum Sharing is used which in itself leads to efficiency losses), which in itself reduces the efficiency and would add additional load to other layers.
The crazy speeds demonstrated by 5G are not due to meaningful increases in efficiency, but rather the ability to use more spectrum, spectrum that operators need to purchase at auction, purchase equipment to utilize and pay to run.
Claim: 5G Standalone Core is Cheaper to operate as it is “Cloud Native”
It has been widely claimed that the shift for the 5G Core Architecture to being “Cloud Native” can provide cost savings.
Operators should regard this in a skeptical manner; after all, we’ve been here before.
Did moving from big-iron to VNFs provide the promised cost savings to operators?
For many operators the shift from hardware to software added additional complexity to the network and increased the headcount to support this.
What were once big-iron appliances dedicated to one job, that sat in the corner and chugged away, are now virtual machines (VNFs). Many operators have naturally found themselves needing a larger team to manage the virtual environment, compared to the size of the team they needed to just to plug power and data into a big box in an exchange before everything was virtualized.
Introducing a “Cloud Native” Kubernetes layer on top of the VNF / virtualization layer, on top of the compute layer, leaves us with a whole lot of layers. All of which require resources to be maintain, troubleshoot and kept running; each layer having associated costs for staffing, licensing and support.
Many mid size enterprises rushed into “the cloud” for the promised cost savings only to sheepishly admit it cost more than the expected.
Almost none of the operators are talking about running these workloads in the public cloud, but rather “Private Clouds” built on-premises, using “Cloud Native” best practices.
One of the central arguments about cloud revolves around “elastic scaling” where the network can automatically scale to match demand; think extra instances spun up a times of peak demand and shut down when the demand drops.
I explain elastic scaling to clients as having to move people from one place to another. Most of the time, I’m just moving myself, a push bike is fine, or I’ve got a 4 seater car, but occasionally I’ll need to move 25 people and for that I’d need a bus.
If I provide the transportation myself, I need to own a bike, a car and a bus.
But if use the cloud I can start with the push bike, and as I need to move more people, the “cloud” will provide me the vehicle I need to move the people I need to move at that moment, and I’ll just pay for the time I need the bus, and when I’m done needing the bus, I drop back to the (cheaper) push bike when I’m not moving lots of people.
While telecom operators are going to provide the servers to run this in “On-prem-cloud”, they need to dimension for the maximum possible load. This means they need to own a bike/car/bus, even if they’re not using it most of the time, and there’s really no cost savings to having a bus but not using it when you’re not paying by the hour to hire it.
Infrastructure aside, introducing a Standalone 5G Core adds another core network to maintain. Alongside the Circuit Switched Core (MSC/GGSN/SGSN) serving 2G/3G subscribers, Evolved Packet Core serving 4G (LTE) and 5G-NSA subscribers, adding a 5G Standalone Core to for the 5G-SA subscribers served by the 5G SA cells, is going to be more work (and therefore cost).
While the majority of operators have yet to turn off their 2G/3G core networks, introducing another core network to run in parallel is unlikely to lead to any cost savings.
Claim: Upgrading now can save money in the Future / Future Proofing
Life cycles of telecommunications are two fold, one is the equipment/platform life cycle (like the RAN components or Core network software being used to deliver the service) the other is the technology life cycle (the generation of technology being used).
The technology lifecycles in telecommunications are vastly longer than that for regular tech.
GSM (2G) was introduced into the UK in 1991, and will be phased out starting in 2033, a 42 year long technology life cycle.
No vendor today could reasonably expect the 5G hardware you deploy in 2024 to still be in production in 2066 – The platform/equipment life cycle is a lot shorter than the technology life cycle.
Operators will to continue relying on LTE (4G) well into the late 2030s.
I’d wager that there is not a single piece of equipment in the Vodafone UK GSM network today, that was there in 1991. I’d go even further to say that any piece of equipment in the network today, didn’t even replace the 1991 equipment, but was probably 3 or 4 generations removed from the network built in 1991.
For most operators, RAN replacements happen between 4 to 7 years, often with targeted augmentation / expansion as needed in the form of adding extra layers / sectors between these times.
The question operators should be asking is therefore not what will I need to get me through to 2066, but rather what will I need to get to 2030?
The majority of operators outside the US today still operate a 2G or 3G network, generally with minimal bandwidth to support legacy handsets and devices, while the 4G (LTE) network does most of the heavy lifting for carrying user traffic. This is often with the aid of an additional 5G-NSA (Non-Standalone) layer to provide additional capacity.
Is there a cost saving angle to adding support for 5G-Standalone in addition to 2G/3G/4G (LTE) and 5G (Non-Standalone) into your RAN?
A logical stance would be that removing layers / technologies (such as 2G/3G sunsetting) would lead to cost savings, and adding a 5G Standalone layer would increase cost.
All of the RAN solutions on the market today from the major vendors include support for both Standalone 5G and Non Standalone, but the feature licensing for a non-standalone 5G is generally cheaper than that for Standalone 5G.
The question operators should be asking is on what timescale do I need Standalone 5G?
If you’ve rolled out 5G-NSA today, then when are you looking to sunset your LTE network? If the answer is “I hope to have long since retired by that time”, then you’ve just answered that question and you don’t need to licence / deploy 5G-SA in this hardware refresh cycle.
Other Cost Factors
Roaming: The majority of roaming traffic today relies on 2G/3G for voice. VoLTE roaming is (finally) starting to establish a foothold, but we are a long way from ubiquitous global roaming for LTE and VoLTE, and even further away for 5G-SA roaming. Focusing on 5G roaming will enable your network for roaming use by a miniscule number of operators, compared to LTE/VoLTE roaming which covers the majority of the operators in the developed world who can utilize your service.
I decided to split this into 3 posts, next I’ll post the “5G can make us more money” post and finally a “5G because we have to” post. I’ll post that on LinkedIn / Twitter / Mailing list, so stick around, and feel free to trash me in the comments.
Android, being open source, allows us to see how this logic works, and it’s important for operators to understand this logic, as it’s what dictates the behavior in many scenarios.
It’s important to note that I’m not covering Apple here, this information is not publicly available to share for iOS devices, so I won’t be sharing anything on this – Apple has their own ecosystem to handle emergency calling, if you’re from an operator and reading this, I’d suggest getting in touch with your Apple account manager to discuss it, they’re always great to work with.
The Android Open Source Project has an “emergency number database”. This database has each of the emergency phone numbers and the corresponding service, for each country.
This file can be read at packages/services/Telephony/ecc/input/eccdata.txt on a phone with engineering mode.
Let’s take a look what’s in mainline Android for Australia:
For example, an American visiting the UK, would have 911 on the Emergency Calling Codes list on their SIM card, but in the UK they dial 999 to reach emergency services.
There’s two angles to this, the first is if a roamer dials the emergency calling code of their home country, the other is if they dial the emergency calling code of the country they are in.
Let’s look at the first scenario, where the roamer dials the emergency calling code of their home country.
If our American in the UK abroad dials 911, that number is on the ECC list on the SIM, it’s still flagged as an emergency call, and just goes out with the standard urn:service:sos URN – The network never sees 911 or 999, just that it’s an SOS call that goes to the PSAP.
In this scenario, the fact the dialled number is not passed to the network is actually a positive, we get the intent that the user wants to reach emergency services, and route based on this.
But what if our American friend in need dials 999? That’s the correct number for the end user to dial in the UK after all, but if that’s not in their ECC list on the SIM / device, it’d go through as a regular call right?
If the call does not get flagged as an emergency call on the UE this has its own set of complications and considerations:
S8-Home Routing for VoLTE means that as the UE doesn’t know this is an emergency call, the call will get routed back to the home network. This means the call doesn’t go to the E-CSCF in the visited network, and would probably just get a message saying the number they’ve dialed is unavailable, this would be exactly as if they dialed 999 at home in the US.
But we have a fix for this! On each MME we can set a list of emergency numbers, which would allow our Britt’s phone to know on this network, what the emergency calling codes are, and route the 999 call to the local PSAP, rather than home routing it.
This information is jammed into the Emergency Number List IE in the NAS Attach Accept body.
This means our American visitor in the UK, would know about 999 from the ECC list configured in the roaming operator’s MME.
The purpose of this information element is to encode emergency number(s) for use within the country where the IE is received.
3GPP TS 24.008: 10.5.3.13 – Emergency Number List
Where this becomes more problematic is unauthenticated emergency calling.
For example, a our American visiting the UK, that is not roaming dials 999.
We’ll assume the UK and US operator don’t have a VoLTE roaming agreement because they’ve been kicking the can down the road when it comes to VoLTE roaming… This is super common scenario – last numbers I saw on this were last year with ~50 bilateral VoLTE agreements in place worldwide.
Because the phone is not attached to a local MME, the handset does not know that 999 is an emergency calling code (because it’s not on the SIM), after all, the only way it can get the Emergency Number List is from an MME, and not having been attached to an MME, means the phone does not have the ECC list for the country, so the the handset does not begin the emergency attach procedure to make the call.
Common sense prevails here, on the majority of phones and the majority of SIM profiles, codes like 112 or 911 are treated as emergency calls, but more obscure numbers, such as dialing 999 in the UK or 10111 for South African Police on a handset with US firmware, are not guaranteed to work. Generally dialing the Emergency Calling code in the home network would get you through to some emergency services (although as we talked about in the last post, this might get you routed to the wrong agency in countries where each agency has their own number).
A better way forward?
These days I don’t dial much (apart from if I’m making adjustments on the Step-by-Step exchange), when I call people I do it from contacts, hyperlinks, etc.
There is mountains of research to suggest that asking people to remember codes and phone numbers, is a struggle. A tourist who finds themselves in Tunisia in need of assistance, is unlikely to remember that it’s 190 for an Ambulance, and 198 for Fire.
Perhaps the ECC list on a phone should populate a page of icons from the emergency page on the phone, with the universal icon for each agency, that sends to the URN for that service type?
Countries with a single PSAP could have the URNs for each service type routed to the same place, while countries with seperated PSAPs for each service type, can route accordingly.
Likewise if a country does have a centralised PSAP for all call types, knowing the type that is selected would be useful, for example if the user has pressed fire and is not responsive when the call is answered, the best unit to dispatch would probably be a fire engine.
A lot of countries have a single point of contact for emergency services; in Europe you’d call 112 in an emergency, 000 in Australia or 911 in the US. Calling this number in the country will get you the emergency services.
This means a caller can order an ambulance for smoke inhalation, and the fire brigade, in one call.
But that’s not the case in every country; many countries don’t have one number for theemergency services, they’ve got multiple; a phone number for police, a different number for fire brigade and a different number for an ambulance.
For example, in Brazil if you need the police, you call 190, while a for example, uses 193 as the emergency number for the fire department, the police can be reached at 190 or 191 depending on if it’s road policing or general, and medical emergencies are covered by 192. Other countries have similar setups.
This is all well and good if you’re in Brazil, and you call 192 for an ambulance, the phone sends a SIP INVITE with a Request URI of sip:[email protected], because we can put a rule into our E-CSCF to say if the number is 192 to route it to the answer point for ambulances – But that’s not often the case on emergency calls.
In IMS, handsets generally detect the number dialed is on the Emergency Calling Code (ECC) list from the USIM Card.
The use of the ECC list means the phone knows this is an emergency call, and this is really important. For countries that use AML this can trigger sending of the AML SMS that process, and Emergency Calls should always be allowed to be made, even without credit, a valid SIM card, or even a SIM in the phone at all.
But this comes with a cost; when a user dials 911, the phones doesn’t (generally) send a call to sip:[email protected] like it would with any other dialled number, but rather the SIP INVITE is sent to urn:service:sos which will be routed to the PSAP by the E-CSCF. When a call comes through to these URNs they’re given top priority in the network
This is all well and good in a country where it doesn’t matter which emergency service you called, because all emergency calls route to a single PSAP, but in a country with multiple numbers, it’s really important when you call and ambulance, your call doesn’t get routed to animal control.
That means the phone has to look at what emergency number you’ve dialed, and map the URN it sends the call to to match what you’ve actually requested.
Recently we’ve been helping an operator in a country with a numbering plan like this, and we’ve been finding the limits of the standards here. So let’s start by looking at what the standards state:
IMS Emergency Calling is governed by TS 103.479 which in turn delegates to IETF RFC 5031, but for the calling number to URN translation, it’s pretty quiet.
Let’s look at what RFC 5031 allows for URNs:
urn:service:sos.ambulance
urn:service:sos.animal-control
urn:service:sos.fire
urn:service:sos.gas
urn:service:sos.marine
urn:service:sos.mountain
urn:service:sos.physician
urn:service:sos.poison
urn:service:sos.police
The USIM’s Emergency Calling Codes EF would be the perfect source of this data; for each emergency calling code defined, you’ve got a flag to indicate what it’s for, here’s what we’ve got available on the SIM Card:
Bit 1 Police
Bit 2 Ambulance
Bit 3 Fire Brigade
Bit 4 Marine Guard
Bit 5 Mountain Rescue
Bit 6 manually initiated eCall
Bit 7 automatically initiated eCall
Bit 8 is spare and set to “0”
So these could be mapped pretty easily you’d think, so if the call is made to an Emergency Calling Code flagged with Bit 4, the URN would go to urn:service:sos.mountain.
Alas from our research, we’ve found most OEMs send calls to the generic urn:service:sos, regardless of the dialled number and the ECC flags that are set on the SIM for that number.
One of the big chip vendors sends calls to an ECC flagged as Ambulance to urn:service:sos.fire, which is totally infuriating, and we’ve had to put a rule in our E-CSCF to handle this if the User Agent is set to one of their phones.
Is there room for improvement here? For sure! Emergency calling is super important, and time is of the essence, while animal control can probably transfer you to an ambulance, an emergency is by very nature time sensitive, and any time wasted can lead to worse outcomes.
While carrier bundles from the OEMs can handle this, the global ability to take any phone, from any country and call an emergency number is so important, that relying on a country-by-country approach here won’t suffice.
What could we do as an industry to address this?
Acknowledging that not all countries have a single point of contact for emergency service, introducing a simple mechanism in the UE SIP message to indicate what number (Emergency Calling Code) the user actually dialled would be invaluable here.
URNs are important, but knowing the dialed number when it comes to PSAP routing, is so important – This wouldn’t even need to be its own SIP header, it could just be thrown into the Contact header as another parameter.
Highly developed markets are often the first to embrace new tech (for us this means VoLTE and VoNR), but this means that these issues seen by less developed markets won’t appear until long after the standard has been set in stone, and often countries like this aren’t at the table of the standards bodies to discuss such requirements.
This easy, reasonable update to the standard, has the potential to save lives, and next time this comes up in a working group I’ll be advocating for a change.
At long last, more and more Australians are going to have access to fibre based access to the NBN, and this seemed like as good an excuse than any to take a deep dive into how NBN’s GPON based fibre services are delivered to homes.
Let’s start in your local exchange where you’ll likely find a Nokia (Well, probably Alcatel-Lucent branded) 7210 SAS-R access aggregation switch, which is where NBN’s transmission network ends, and the access network begins.
It in turn spits out a 10 gig interface to feed the Optical Line Terminal (OLT), which provides the GPON services, each port on the OLT is split out and can feed 32 subscribers.
In NBN’s case, Nokia (Alcatel-Lucent) 7302, and rather than calling it an OLT, they call it a “FAN” or “Fibre Access Node” – Seemingly because they like the word node.
Each of the Nokia 7302s has at least one NGLT-A line card, which has 8 GPON ports. Each of the 8 ports on these cards can service 32 customers, and is fed by 2x 10Gbps uplinks to two 7210 SAS-R aggregation switches.
The chassis supports up to 16 cards, 8 ports each, 32 subs per port, giving us 4096 subscribers per FAN.
In some areas, FANs/OLTs aren’t located in an exchange but rather in a street cabinet, called a Temporary Fibre Access Node – Although it seems they’re very permanent.
In reality, each port on the OLT/FAN goes out Distribution Fibre Network or DFN which links the ports on the OLTs to a distribution cabinet in the street, known as as a Fibre Distribution Hub, or FDH.
If you look in FTTH areas, you’ll see the FDH cabinets. The FDH is essentially a roadside optical distribution frame, used to cross connect cables from the Distribution Fibre Network (DFN) to the Local Fibre Network (LFN), and in a way, you can think of it as the GPON equivalent of a pillar, except this is where we have our optical splitters.
Remember when we were talking about the FAN/OLT how one port could serve 32 subscribers? We do that with a splitter, that takes one fibre from the DFN that runs to the FAN, and gives us 32 fibres we can could connect to an ONT onto to get service.
The FDH cabinets are made by Corning (OptiTect 576 fibre pad mounted cabinets) and you can see in the top right the Aqua cables go to the Distribution Fibre Network, and hanging below it on the right are the optical splitters themselves, which split the one fibre to the FAN into 32 fibres each on SC connectors.
These are then patched to the Local Fibre Network on the left hand side of the cabinet, where there’s up to 576 ports running across the suburb, and a “Parking” panel at the bottom where the unused ports from the splitter can be left until you patch the to the DFN ports above.
The FDH cabinets also offer “passthrough” allowing a fibre to from the FAN to be patched through to the DFN without passing through the GPON splitter, although I’m not clear if NBN uses this capability to deliver the NBN Business services.
But having each port in the FDH going to one home would be too simple; you’d have to bring 576 individually sheathed cables to the FDH and you’d lose too much flexibility in how the cable plant can be structured, so instead we’ve got a few more joints to go before we make it to your house.
From the FDH cabinet we go out into the Local Fibre Network, but NBN has two variants of LFN – LFN and Skinny LFN. The traditional LFN uses high-density ribbon fibres, which offer a higher fibre count but is a bit tricker to splice/work with. The Skinny LFN uses lower fibre count cables with stranded fibres, and is the current preferred option.
The original LFN cables are ribbon fibres and range from 72 to 288 fibre counts, but I believe 144 is the most common.
These LFN cables run down streets and close to homes, but not directly to lead in cables and customer houses.
These run to “Transition Closures” (Older NBN) or “Flexibility Joint Locations” (FJLs – Newer NBN)
While researching this I saw references to “Breakout Joint Locations” (BJLs) which are used in FTTC deployments, and are a Tenio B6 enclosure for 2x 12 Fibers and 4x 1 Fibers with a 1×4 splitter.
The FJLs are TE Systems’ (Now Commscope) Tenio range of fibre splice closures, and they’re use to splice the high fibre count cable from from the FDH cabinets into smaller 12 fibre count cables that run to multiple “Splitter Multi Ports” or “SMPs” in pits outside houses, and can contain splitters factory installed.
Sealed NBN FJLOpen NBN FJLReassembled NBN FJLFJL in situ in pitNeat open FJL
The splitters, referred to as “Multiports” or “SMPs” are Corning’s OptiSheath MultiPort Terminals, and they’re designed and laid out in such a way that the tech can activate a service, without needing to use a fusion splicer.
Due to the difficulty/cost in splicing fibre in pits for a service activation, NBNco opted to go from the FJL to the SMPs, where a field tech can just screw in a weatherproof fibre connector lead in to the customer’s premises.
During installation / activation callouts, the tech is assigned an SMP in the pit near the customer’s house, and a port on it. This in turn goes to the FJL and onto FDH cabinet as we just covered, but that patching/splicing for that is already done, so the tech doesn’t need to worry about that.
SMP in pitCorning OptiSheath Multiport SplitterSMP in a pit
The tech just plugs in a pre-terminated lead in cable with a weatherproof fibre end, and screws it into the allocated port on the SMP, then hauls the other end of the lead in cable to the Premises Connection Device (Made by Madison or Tyco), located on the wall of the customer’s house.
TC000006252
The customer end of the lead in cable may be a pre terminated SC connector, or may get mechanically spliced onto a premade SC pigtail. In either case, they both terminate onto an SC male connector, which goes into an SC-SC female coupler inside the PCD.
Next is the customer’s internal wiring, again, preterm cable is used, to run between the PCD and the First Fibre Wall Outlet inside the house. This preterm cable join the lead in cable inside the PCD on the SC-SC female coupler, to join to the lead in.
Inside the house we have the “Network Termination Device” (NTD), which is a GPON ONT, is where the fibre from the street terminates and is turned into an Ethernet handoff to the customer. NBN has been through a few models of NTD, but the majority support 2x ATA ports for analog phones, and the option for an external battery backup unit to keep the device powered if mains power is lost.
Phew! That’s what I’ve been able to piece together from publicly available documentation, some of this may be out of date, and I can see there’s been several revisions to the LFN / DFN architectures over the years, if there’s anything I have incorrect here, please let me know!
In the cellular world, subscribers are charged for data from the IP, transport and applications layers; this means you pay for the IP header, you pay for the TCP/UDP header, and you pay for the contents (the cat videos it contains).
This also means if an operator moves mobile subscribers from IPv4 to IPv6, there’s an extra 20 bytes the customer is charged for for every packet sent / received, which the customer is charged for – This is because the IPv6 header is longer than the IPv4 header.
In most cases, mobile subs don’t get a choice as to if their connection is IPv4 or IPv6, but on a like for like basis, we can say that if a customer moves is on IPv6 every packet sent/received will have an extra 20 bytes of data consumed compared to IPv4.
This means subscribers use more data on IPv6, and this means they get charged for more data on IPv6.
For IoT applications, light users and PAYG users, this extra 20 bytes per packet could add up to something significant – But how much?
We can quantify this, but we’d need to know the number of packets sent on average, and the quantity of the data transferred, because the number of packets is the multiplier here.
So for starters I’ve left a phone on the desk, it’s registered to the network but just sitting in Idle mode – This is an engineering phone from an OEM, it’s just used for testing so doesn’t have anything loaded onto it in terms of apps, it’s not signed into any applications, or checking in the background, so I thought I’d try something more realistic.
So to get a clearer picture, I chucked a SIM in my regular everyday phone I use personally, registered it to the cellular lab I have here. For the next hour I sniffed the GTP traffic for the phone while it was sitting on my desk, not touching the phone, and here’s what I’ve got:
Overall the PCAP includes 6,417,732 bytes of data, but this includes the transport and GTP headers, meaning we can drop everything above it in our traffic calculations.
For this I’ve got 14 bytes of ethernet, 20 bytes IP, 8 bytes UDP and 5 bytes for TZSP (this is to copy the traffic from the eNB to my local machine), then we’ve got the transport from the eNB to the SGW, 14 bytes of ethernet again, 20 bytes of IP , 8 bytes of UDP and 8 bytes of GTP then the payload itself. Phew. All this means we can drop 97 bytes off every packet.
We have 16,889 packets, 6,417,732 bytes in total, minus 97 bytes from each gives us 1,638,233 of headers to drop (~1.6MB) giving us a total of 4.556 MB traffic to/from the phone itself.
This means my Android phone consumes 4.5 MB of cellular data in an hour while sitting on the desk, with 16,889 packets in/out.
Okay, now we’re getting somewhere!
So now we can answer the question, if each of these 16k packets was IPv6, rather than IPv4, we’d be adding another 20 bytes to each of them, 20 bytes x 16,889 packets gives 337,780 bytes (~0.3MB) to add to the total.
If this traffic was transferred via IPv6, rather than IPv4, we’d be looking at adding 20 bytes to each of the 16,889 packets, which would equate to 0.3MB extra, or about 7% overhead compared to IPv4.
But before you go on about what an outrage this IPv6 transport is, being charged for those extra bytes, that’s only one part of the picture.
There’s a reason operators are finally embracing IPv6, and it’s not to put an extra 7% of traffic on the network (I think if you asked most capacity planners, they’d say they want data savings, not growth).
IPv6 is, for lack of a better term, less rubbish than IPv4.
There’s a lot of drivers for IPv6, and some of these will reduce data consumption. IPv6 is actually your stuff talking directly to the remote stuff, this means that we don’t need to rely on NAT, so no need to do NAT keepalives, and opening new sessions, which is going to save you data. If you’re running apps that need to keep a connection to somewhere alive, these data savings could negate your IPv6 overhead costs.
Will these potential data savings when using IPv6 outweigh the costs?
That’s going to depend on your use case.
If you’ve extremely bandwidth / data constrained, for example, you have an IoT device on an NTN / satellite connection, that was having to Push data every X hours via IPv4 because you couldn’t pull data from it as it had no public IP, then moving it to IPv6 so you can pull the data on the public IP, on demand, will save you data. That’s a win with IPv6.
If you’re a mobile user, watching YouTube, getting push notifications and using your phone like a normal human, probably not, but if you’re using data like a normal user, you’ve probably got a sizable data allowance that you don’t end up fully consuming, and the extra 20 bytes per packet will be nothing in comparison to the data used to watch a 2k video on your small phone screen.
Ask someone with headphones and a lanyard in the halls of a datacenter what transport does DNS use, there’s a good chance the answer you’d get back is UDP Port 53.
But not always!
In scenarios where the DNS response is large (beyond 512 bytes) a DNS query will shift over to TCP for delivery.
How does the client know when to shift the request to TCP – After all, the DNS server knows how big the response is, but the client doesn’t.
The answer is the Truncated flag, in the response.
The DNS server sends back a response, but with the Truncated bit set, as per RFC 1035:
TC TrunCation – specifies that this message was truncated due to length greater than that permitted on the transmission channel.
RFC 1035
Here’s an example of the truncated bit being set in the DNS response.
The DNS client, upon receiving a response with the truncated bit set, should run the query again, this time using TCP for the transport.
One prime example of this is DNS NAPTR records used for DNS in roaming scenarios, where the response can quite often be quite large.
If it didn’t move these responses to TCP, you’d run the risk of MTU mismatches dropping DNS. In that half of my life has been spent debugging DNS issues, and the other half of my life debugging MTU issues, if I had MTU and DNS issues together, I’d be looking for a career change…
Want more telecom goodness?
I have a good old fashioned RSS feed you can subscribe to.