Technology is constantly evolving, new research papers are published every day.
But recently I was shocked to discover I’d missed a critical development in communications, that upended Shannon’s “A mathematical theory of communication”.
I’m talking of course, about the GENERATION X PLUS SP-11 PRO CELL ANTENNA.
I’ve been doing telecom work for a long time, while I mostly write here about Core & IMS, I am a licenced rigger, I’ve bolted a few things to towers and built my fair share of mobile coverage over the years, which is why I found this development so astounding.
With this, existing antennas can be extended, mobile phone antennas, walkie talkies and cordless phones can all benefit from the improvement of this small adhesive sticker, which is “Like having a four foot antenna on your phone”.
So for the bargain price of $32.95 (Or $2 on AliExpress) I secured myself this amazing technology and couldn’t wait to quantify it’s performance.
Think of the applications – We could put these stickers on 6 ft panel antennas and they’d become 10ft panels. This would have a huge effect on new site builds, minimize wind loading, less need for tower strengthening, more room for collocation on the towers due to smaller equipment footprint.
Luckily I have access to some fancy test equipment to really understand exactly how revolutionary this is.
The packaging says it’s like having a 4 foot antenna on your phone, let’s do some very simple calculations, let’s assume the antenna in the phone is currently 10cm, and that with this it will improve to be 121cm (four feet).
According to some basic projections we should see ~21dB gain by adding the sticker, that’s a 146x increase in performance!
Man am I excited to see this in action.
Fortunately I have access to some fun cellular test equipment, including the Viavi CellAdvisor and an environmentally controlled lab my kitchen bench.
I put up a 1800Mhz (band 3) LTE carrier in my office in the other room as a reference and placed the test equipment into the test jig (between the sink and the kettle).
We then took baseline readings from the omni shown in the pictures, to get a reading on the power levels before adding the sticker.
We are reading exactly -80dBm without the sticker in place, so we expertly put some masking tape on the omni (so we could peel it off) and applied the sticker antenna to the tape on the omni antenna.
At -80dBm before, by adding the 21dB of gain, we should be put just under -60dBm, these Viavi units are solid, but I was fearful of potentially overloading the receive end from the gain, after a long discussion we agreed at these levels it was unlikely to blow the unit, so no in-line attenuation was used.
Okay, </sarcasm> I was genuinely a little surprised by what we found; there was some gain, as shown in the screenshot below.
Marker 1 was our reference without the sticker, while reference 2 was our marker with the sticker, that’s a 1.12dB gain with the sticker in place. In linear terms that’s a ~30% increase in signal strength.
Screenshot
So does this magic sticker work? Well, kinda, in as much that holding onto the Omni changes the characteristics, as would wrapping a few turns of wire around it, putting it in the kettle or wrapping it in aluminum foil. Anything you do to an antenna to change it is going to cause minor changes in characteristic behavior, and generally if you’re getting better at one frequency, you get worse at another, so the small gain on band 3 may also lead to a small loss on band 1, or something similar.
So what to make of all this? Maybe this difference is an artifact from moving the unit to make a cup of tea, the tape we applied or just a jump in the LTE carrier, or maybe the performance of this sticker is amazing after all…
I’ve written about Milenage and SIM based security in the past on this blog, and the component that prevents replay attacks in cellular network authentication is the Sequence Number (Aka SQN) stored on the SIM.
Think of the SQN as an incrementing odometer of authentication vectors. Odometers can go forward, but never backwards. So if a challenge comes in with an SQN behind the odometer (a lower number), it’s no good.
Why the SQN is important for Milenage Security
Every time the SIM authenticates it ticks up the SQN value, and when authenticating it checks the challenge from the network doesn’t have an SQN that’s behind (lower than) the SQN on the SIM.
Let’s take a practical example of this:
The HSS in the network has SQN for the SIM as 8232, and generates an authentication challenge vector for the SIM which includes the SQN of 8232. The SIM receives this challenge, and makes sure that the SQN in the SIM, is equal to or less than 8232. If the authentication passes, the new SQN stored in the SIM is equal to 8232 + 1, as that’s the next valid SQN we’d be expecting, and the HSS incriments the counters it has in the same way.
By constantly increasing the SQN and not allowing it to go backwards, means that even if we pre-generated a valid authentication vector for the SIM, it’d only be valid for as long as the SQN hasn’t been authenticated on the SIM by another authentication request.
Imagine for example that I get sneaky access to an operator’s HSS/AuC, I could get it to generate a stack of authentication challenges that I could use for my nefarious moustache-twirling purposes whenever I wanted.
This attack would work, but this all comes crumbling down if the SIM was to attach to the real network after I’ve generated my stack of authentication challenges.
If the SQN on the SIM passes where it was when the vectors were generated, those vectors would become unusable.
It’s worth pointing out, that it’s not just evil purposes that lead your SQN to get out of Sync; this happens when you’ve got subscriber data split across multiple HSSes for example, and there’s a mechanism to securely catch the HSS’s SQN counter up with the SQN counter in the SIM, without exposing any secrets, but it just ticks the HSS’s SQN up – It never rolls back the SQN in the SIM.
The Flaw – Draining the Pool
The Authentication Information Request is used by a cellular network to authenticate a subscriber, and the Authentication Information Answer is sent back by the HSS containing the challenges (vectors).
When we send this request, we can specify how many authentication challenges (vectors) we want the HSS to generate for us, so how many vectors can you generate?
TS 129 272 says the Number-of-Requested-Vectors AVP is an Unsigned32, which gives us a possible pool of 4,294,967,295 combinations. This means it would be legal / valid to send an Authentication Information Request asking for 4.2 billion vectors.
It’s worth noting that that won’t give us the whole pool.
Sequence numbers (SQN) shall have a length of 48 bits.
TS 133 102
While the SQN in the SIM is 48 bits, that gives us a maximum number of values before we “tick over” the odometer of 281,474,976,710,656.
If we were to send 65,536 Authentication-Information-Requests asking for 4,294,967,295 a piece, we’d have got enough vectors to serve the sub for life.
Except the standard allows for an unlimited number of vectors to be requested, this would allow us to “drain the pool” from an HSS to allow every combination of SQN to be captured, to provide a high degree of certainty that the SQN provided to a SIM is far enough ahead of the current SQN that the SIM does not reject the challenges.
Can we do this?
Our lab has access to HSSes from several major vendors of HSS.
Out of the gate, the Oracle HSS does not allow more than 32 vectors to be requested at the same time, so props to them, but the same is not true of the others, all from major HSS vendors (I won’t name them publicly here).
For the other 3 HSSes we tried from big vendors, all eventually timed out when asking for 4.2 billion vectors (don’t know why that would be *shrug*) from these HSSes, it didn’t get rejected.
This is a lab so monitoring isn’t great but I did see a CPU spike on at least one of the HSSes which suggests maybe it was actually trying to generate this.
Of course, we’ve got PyHSS, the greatest open source HSS out there, and how did this handle the request?
Well, being standards compliant, it did what it was asked – I tested with 1024 vectors I’ll admit, on my little laptop it did take a while. But lo, it worked, spewing forth 1024 vectors to use.
So with that working, I tried with 4,294,967,295…
And I waited. And waited.
And after pegging my CPU for a good while, I had to get back to real life work, and killed the request on the HSS.
In part there’s the fact that PyHSS writes back to a database for each time the SQN is incremented, which is costly in terms of resources, but also that generating Milenage vectors in LTE is doing some pretty heavy cryptographic lifting.
The Risk
Dumping a complete set of vectors with every possible SQN would allow an attacker to spoof base stations, and the subscriber would attach without issue.
Historically this has been very difficult to do for LTE, due to the mutual network authentication, however this would be bypassed in this scenario.
The UE would try for a resync if the SQN is too far forward, which mitigates this somewhat.
Cryptographically, I don’t know enough about the Milenage auth to know if a complete set of possible vectors would widen the attack surface to try and learn something about the keys.
Mitigations / Protections
So how can operators protect ourselves against this kind of attack?
Different commercial HSS vendors handle this differently, Oracle limits this to 32 vectors, and that’s what I’ve updated PyHSS to do, but another big HSS vendor (who I won’t publicly shame) accepts the full 4294967295 vectors, and it crashes that thread, or at least times it out after a period.
If you’ve got a decent Diameter Routing Agent in place you can set your DRA to check to see if someone is using this exploit against your network, and to rewrite the number of requested vectors to a lower number, alert you, or drop the request entirely.
Having common OP keys is dumb, and I advocate to all our operator customers to use OP keys that are unique to each SIM, and use the OPc key derived anyway. This means if one SIM spilled it’s keys, the blast doesn’t extend beyond that card.
In the long term, it’d be good to see 3GPP limit the practical size of the Number-of-Requested-Vectors AVP.
2G/3G Impact
Full disclosure – I don’t really work with 2G/3G stacks much these days, and have not tested this.
MAP is generally pretty bandwidth constrained, and to transfer 280 billion vectors might raise some eyebrows, burn out some STPs and take a long time…
But our “Send Authentication Info” message functions much the same as the Authentication Information Request in Diameter, 3GPP TS 29.002 shows we can set the number of vectors we want:
5GC Vulnerability
This only impacts LTE and 5G NSA subscribers.
TS 29.509 outlines the schema for the Nausf reference point, used for requesting vectors, and there is no option to request multiple vectors.
Summary
If you’ve got baddies with access to your HSS / HLR, you’ve got some problems.
But, with enough time, your pool could get drained for one subscriber at a time.
This isn’t going to get the master OP Key or plaintext Ki values, but this could potentially weaken the Milenage security of your system.
One of the new features of 5GC is the introduction of Service Based Interfaces (SBI) which is part of 5GC’s Service Based Architecture (SBA).
Let’s start with the description from the specs:
3GPP TS 23.501 [3] defines the 5G System Architecture as a Service Based Architecture, i.e. a system architecture in which the system functionality is achieved by a set of NFs providing services to other authorized NFs to access their services.
3GPP TS 29.500 – 4.1 NF Services
For that we have two key concepts, service discovery, and service consumption
Services Consumer / Producers
That’s some nice words, but let’s break down what this actually means, for starters, let’s talk about services.
In previous generations of core network we had interfaces instead of services. Interfaces were the reference point between two network elements, describing how the two would talk. The interfaces were the protocols the two interfaces used to communicate.
For example, in EPC / LTE S6a is the interface between the MME and the HSS, S5 is the interface between the S-GW and P-GW. You could lookup the 3GPP spec for each interface to understand exactly how it works, or decode it in Wireshark to see it in action.
5GC moves from interfaces to services. Interfaces are strictly between two network elements, the S6a interface is only used between the MME and the HSS, while a service is designed to be reusable.
This means the Service Based Interface N5g-eir can be used by the AMF, but it could equally be used by anyone else who wants access to that information.
3GPP defines the service in the form of service producer (The EIR produces the N5g-eir service) and the service consumer (The client connecting to the N5g-eir service), but doens’t restrict which network elements can
This gets away from the soup of interfaces available, and instead just defines the services being offered, rather than locking the
“service consumers” (which can be thought of clients in a client/server model) can discover “service producers” (like servers in a client/server model).
Our AMF, which acts as a “service consumer” consuming services from the UDM/UDR and SMF.
Service Discovery – Automated Discovery of NF Services
Service-Based Architecture enables 5G Core Network Function service discovery.
In simple terms, this means rather that your MME being told about your SGW, the nodes all talk to a “Network Repository Function” that returns a list of available nodes.
The mobility management and connection management process in 5GC focuses on Connection Management (CM) and Registration Management (RM).
Registration Management (RM)
The Registration Management state (RM) of a UE can either be RM-Registered or RM-Deregistered. This is akin to the EMM state used in LTE.
RM-Deregistered Mode
From the Core Network’s perspective (Our AMF) a UE that is in RM-Deregistered state has no valid location information in the AMF for that UE. The AMF can’t page it, it doesn’t know where the UE is or if it’s even turned on.
From the UE’s perspective, being in RM-Deregistered state could mean one of a few things:
UE is in an area without coverage
UE is turned off
SIM Card in the UE is not permitted to access the network
In short, RM-Deregistered means the UE cannot be reached, and cannot get any services.
RM-Registered Mode
From the Core Network’s perspective (the AMF) a UE in RM-Registered state has sucesfully registered onto the network.
The UE can perform tracking area updates, period registration updates and registration updates.
There is a location stored in the AMF for the UE (The AMF knows at least down to a Tracking Area Code/List level where the UE is).
The UE can request services.
Connection Management (CM)
Connection Management (CM) focuses on the NAS signaling connection between the UE and the AMF.
To have a Connection Management state, the Registration Management procedure must have successfully completed (the UE being in RM-Registered) state.
A UE in CM-Connected state has an active signaling connection on the N1 interface between the UE and the AMF.
CM-Idle Mode
In CM-Idle mode the UE has no active NAS connection to the AMF.
UEs typically enter this state when they have no data to send / recieve for a period of time, this conserves battery on the UE and saves network resources.
If the UE wants to send some data, it performs a Service Request procedure to bring itself back into CM-Connected mode.
If the Network wants to send some data to the UE, the AMF sends a paging request for the UE, and upon hearing it’s identifier (5G-S-TMSI) on the paging channel, the UE performs the Service Request procedure to bring itself back into CM-Connected mode.
CM-Connected Mode
In CM-Connected mode the UE has an active NAS connection with the AMF over the N1 interface from the UE to the AMF.
When the access network (The gNodeB) determines this state should change (typically based on the UE being idle for longer than a set period of time) the gNodeB releases the connection and the UE transitions to CM-Idle Mode.
So let’s roll up our sleeves and get a Lab scenario happening,
To keep things (relatively) simple, I’ve put the eNodeB on the same subnet as the MME and Serving/Packet-Gateway.
So the traffic will flow from the eNodeB to the S/P-GW, via a simple Network Switch (I’m using a Mikrotik).
While life is complicated, I’ll try and keep this lab easy.
Experiment 1: MTU of 1500 everywhere
Network Element
MTU
Advertised MTU in PCO
1500
eNodeB
1500
Switch
1500
Core Network (S/P-GW)
1500
So everything attaches and traffic flows fine. There is no problem right?
Well, not a problem that is immediately visible.
While the PCO advertises the MTU value at 1500 if we look at the maximum payload we can actually get through the network, we find that’s not the case.
This means if our end user on a mobile device tried to send a 1500 byte payload, it’d never get through.
While DNS would work, most TCP traffic would flow fine, certain UDP applications would start to fail if they were sending payloads nearing 1500 bytes.
So why is this?
Well GTP adds overhead.
8 bytes for the GTP header
8 bytes for the transport UDP header
20 bytes for the transport IPv4 header
14 bytes if our transport is using Ethernet
For a total of 50 bytes of overhead, assuming we’re not using MPLS, QinQ or anything else funky on our transport network and just using Ethernet.
So we have two options here – We can either lower the MTU advertised in our Protocol Configuration Options, or we can increase the MTU across our transport network. Let’s look at each.
Experiment 2: Lower Advertised MTU in PCO to 1300
Well this works, and looks the same as our previous example, except now we know we can’t handle payloads larger than 1300 without fragmentation.
Experiment 3: Increase MTU across transmission Network
While we need to account for the 50 bytes of overhead added by GTP, I’ve gone the safer option and upped the MTU across the transport to 1600 bytes.
With this, we can transport a full 1500 byte MTU on the UE layer, and we’ve got the extra space by enabling jumbo frames.
Obviously this requires a change on all of the transmission layer – And if you have any hops without support for this, you’ll loose packets.
Conclusions?
Well, fragmentation is bad, and we want to avoid it.
For this we up the MTU across the transmission network to support jumbo frames (greater than 1500 bytes) so we can handle the 1500 byte payloads that users want.
To make more Money (This post, congratulations, you’re reading it!)
Because they have to (Regulatory compliance, insurance, taxes, etc) – That’s the next post
So let’s look at SA in this context.
5G-SA can drive new revenue streams
We (as an industry) suck at this.
Last year on the Telecoms.com podcast, Scott Bicheno made the point that if operators took all the money they’d gambled (and lost) on trying to play in the sports rights, involvement in media companies, building their own streaming apps, attempts at bundling other utilities, digital identity, etc, and just left the cash in the bank and just operated the network, they’d be better off.
Uber, Spotify, “OTTs”, etc, utilize MNOs to enable their services, but operators don’t see this extra revenue. While some operators may talk of “fair share” the truth is, these companies add value to our product (connectivity) which as an industry, we’ve failed to add ourselves.
If the Metaverse does turn out to be a cash cow, it is unlikely the telecommunications industry will be the ones milking it.
Claim: Customers are willing to pay more for 5G-SA
This myth seems to be fairly persistent, but with minimal data to support this claim.
While BSS vendors talk about “5G Monetization”, the truth is, people use their MNO to provide them connectivity. If the coverage is adequate, and the speed enough to do what they need to do, few would be willing to pay any additional cash each month to see higher numbers on a speedtest result (enabled by 5G-NSA) and even fewer would pay extra cash for, well, whatever those features only enabled by 5G-Standalone are?
With most consumers now also holding onto their mobile devices for longer periods of time, and with interest rates reining in consumer spending across the board, we are seeing the rise of a more cost conscious consumer than ever before. If we want to see higher ARPUs, we need to give the consumer a compelling reason to care and spend their cash, beyond a speed test result.
We talk a little about APIs lower down in the post.
Claim: Users want Ultra-Low Latency / High Reliability Comms that only 5G-SA delivers
Wanting to offer a product to the market, is not the same as the market wanting a product to consume.
Telecom operators want customers to want these services, but customer take up rates tell a different story. For a product like this to be viable, it must have a wide enough addressable market to justify the investment.
Reliability
The URLCC standards focus on preventing packet loss, but the world has moved on from needing zero packet loss.
The telecom industry has a habit of deciding what customers want without actually listening. When a customer talks about wanting “reliable” comms, they aren’t saying they want zero packet loss, but rather fewer dropouts or service flaps. For us to give the customer what they are actually asking for involves us expanding RAN footprint and adding transmission diversity, not 5G-SA.
The “protocols of the internet” (TCP/IP) have been around for more than 50 years now.
These protocols have always flowed over transport links with varied reliability and levels of packet loss.
Thanks to these error correction and retransmission techniques built into these protocols, a lost packet will not interrupt the stream. If your nuclear command and control network were carried over TCP/IP over the public internet (please don’t do this), a missing packet won’t lead to worldwide annihilation, but rather the sender will see the receiver never acknowledged the receipt of the packet at the other end, and resend it, end of.
If you walk into a hospital today, you’ll find patient monitoring devices, tracking the vital signs for patients and alerting hospital staff if a patient’s vital signs change. It is hard to think of more important services for reliability than this.
And yet they use WiFi, and have done for a long time, if a packet is lost on WiFi (as happens regularly) it’s just retransmitted and the end user never knows.
Autonomous cars are unlikely to ever rely on a 5G connection to operate, for the simple reason that coverage will never be 100%. If your car stops because you’re in a not-spot, you won’t be a happy customer. While plenty of cars have cellular modems in them, that are used to upload telemetry data back to the manufacturer, but not to drive the car.
One example of wireless controlled vehicles in the wild is autonomous haul trucks in mines. Historically, these have used WiFi for their comms. Mine sites are often a good fit for Private LTE, but there’s nothing inherent in the 5G Standalone standard that means it’s the only tool for the job here.
Slicing
Slicing is available in LTE (4G), with an architecture designed to allow access to others. It failed to gain traction, but is in networks today.
The RAN a piece of the latency puzzle here, but it is just one piece of the puzzle.
If we look at the flow a packet takes from the user’s device to the server they want to talk to we’ve got:
Time it takes the UE to craft the packet
Time it takes for the packet to be transmitted over the air to the base station
Time it takes for the packet to get through the RAN transmission network to the core
Time it takes the packet to traverse the packet core
Time it takes for the packet to get out to transit/peering
Time it takes to get the packet from the edge of the operators network to the edge of the network hosting the server
Time it takes the packet through the network the server is on
Time it takes the server to process the request
The “low latency” bit of the 5G puzzle only involves the two elements in bold.
If you’ve got to get from point A to point B along a series of roads, and the speed limit on two of the roads you traverse (short sections already) is increased. The overall travel time is not drastically reduced.
I’m lucky, I have access to a well kitted out lab which allows me to put all of these latency figures to the test and provide side by side metrics. If this is of interest to anyone, let me know. Otherwise in the meantime you’ll just have to accept some conjecture and opinion.
You could rebut this talking about Edge Compute, and having the datacenter at the base of the tower, but for a number of fairly well documented reasons, I think this is unlikely to attract widespread deployment in established carrier networks, and Intel’s recent yearly earning specifically called this out.
Claim: Customers want APIs and these needs 5G SA
Companies like Twilio have made it easy to interact with the carrier network via their APIs, but yet again, it’s these companies producing the additional value on a service operated by the MNOs.
My coffee machine does not have an API, and I’m OK with this because I don’t have a want or need to interact with it programatically.
By far, the most common APIs used by businesses involving telco markets are APIs to enable sending an SMS to a user.
These have been around for a long time, and the A2P market is pretty well established, and the good news is, operators already get a chunk of this pie, by charging for the SMS.
Imagine a company that makes medical booking software. They’re a tech company, so they want their stack to work anywhere in the world, and they want to be able to send reminder SMS to end users.
They could get an account manager with each of the telcos in each of the markets they work in, onboard and integrate the arcane complexities of each operators wholesale SMS system, or they could use Twilio or a similar service, which gives them global reach.
Often the cost of services like Twilio are cheaper than working directly with the carriers in each market, and even if it is marginally more expensive, the cost savings by not having to deal with dozens of carriers or integrate into dozens of systems, far outweighs this.
While it’s a great idea, in the context of 5G Standalone and APIs, it’s worth noting that none of the use cases in OpenGateway require 5G Standalone (Except possibly Edge discovery, but it is debatable).
Critically, from a developer experience perspective:
I can sign up to services like Twilio without a credit card, and start using the service right away, with examples in my programming language of choice, the developer user experience is fantastic.
Jump on the OpenGateway website today and see if you can even find a way to sign up to use the service?
Claim: Fixed Wireless works best with 5G-SA
Of all the touted use cases and applications for 5G, Fixed Wireless (FWA) has been the most successful.
The great thing about FWA on Cellular networks is you can use the same infrastructure you use for your mobile customers, and then sell excess capacity in the network to deliver Fixed Wireless Access services, better utilizing an asset (great!).
But again, this does not require Standalone 5G. If you deploy your FWA network using 5G SA, then you won’t be able to sweat that same asset for both mobile subscribers and FWA subscribers.
Today at least, very few handsets short of this generation of flagship phones, supports 5G SA. Even the phones sold as supporting 5G over the past few years, are almost all only supporting 5G-NSA, so if you rolled out your FWA network as Standalone, you can’t better utilize the asset by sharing with your existing LTE/5G-NSA customers.
Claim: The Killer App is coming for 5G and it needs 5G SA
This space is reserved for the killer app that requires 5G Standalone.
Whenever that comes?
Anyone?
I’m not paying to build a marina berth for my mega yacht, mostly because I don’t have one. Ditto this.
Could you explain to everyone on an investor call that you’re investing in something where the vessel of the payoff isn’t even known to exist? Telecom is “blue chip”, hardly speculative.
The Future for Revenue Growth?
Maybe there isn’t one.
I know it’s an unthinkable thought for a lot of operators, but let’s look at it rationally; in the developed world, everyone who wants a mobile service already has one.
This leaves operators with two options; gaining market share from their competitors and selling more/higher priced services to existing customers.
You don’t steal away customers from other operators by offering a higher priced product, and with reduced consumer spending people aren’t queuing up to spend more each month.
But there is a silver lining, if you can’t grow revenues, you can still shrink expenditure, which in the end still gets the same result at the end of the quarter – More cash.
Simplify your operations, focus on what you do really well (mobile services), the whole 80/20 rule, get better at self service, all that guff.
There’s no shortage of pain points for consumers telecom operators could address, to make the customer experience better, but few that include the word Slicing.
No one spends marketing dollars talking about the problems with a tech and vendors aren’t out there promoting sweating existing assets. But understanding your options as an operator is more important now than ever before.
Sidebar; This post got really long, so I’m splitting it into 3…
We’re often asked to help define a a 5G strategy for operators; while every case is different, there’s a lot of vendors pushing MNOs to move towards 5G standalone or 5G-SA.
I’m always a fan of playing “devil’s advocate“, and with so many articles and press releases singing the praises of standalone 5G/5G-SA, so as a counter in this post, I’ll be making the case against the narratives presented to operators by vendors that the “right” way to do 5G is to introduce 5G Standalone, that they should all be “upgrading” to Standalone 5G.
With Mobile World Congress around the corner, now seems like a good time to put forward the argument against introducing 5G Standalone, rebutting some common claims about 5G Standalone operators will be told. We’ll counterpoint these arguments and I’ll put forward the case for not jumping onto the 5G-SA bandwagon – just yet.
On a personal note, I do like 5G SA, it has some real advantages and some cool features, which are well documented, including on this blog. I’m not looking to beat up on any vendors, marketing hype or events, but just to provide the “other side” of the equation that operators should consider when making decisions and may not be aware of otherwise. It’s also all opinion of course (cited where possible), but if you’re going to build your network based on a blog post (even one as good as this) you should probably reconsider your life choices.
Some Arcane Detail: 5G Non-Standalone (NSA) vs Standalone (SA)
5G NSA (Non Standalone) uses LTE (4G) with an additional layer “bolted on” that uses 5G on the radio interface to provide “5G” speeds to users, while reusing the existing LTE (Evolved Packet Core) core and VoLTE for voice / SMS.
From an operator perspective there is almost no change required in the network to support NSA 5G, other than in the RAN, and almost all the 5G networks in commercial use today use 5G NSA.
5G NSA is great, it gives the user 5G speeds for users with phones that support it, with no change to the rest of the network needed.
Standalone 5G on the other hand requires an a completely new core network with all the trimmings.
While it is possible to handover / interwork with LTE/4G (Inter-RAT Handovers), this is like 3G/4G interworking, where each has a different core network. Introducing 5G standalone touches every element of the network, you need new nodes supporting the new standards for charging, policy, user plane, IMS, etc.
Scope
There’s an old adage that businesses spend money for one of three reasons:
To Save Money (Which we’ll cover in this post)
To make more Money (Covered next – Will link when published)
Because they have to (Regulatory compliance, insurance, taxes, etc)
Let’s look at 5G Standalone in each of these contexts:
5G Cost Savings – Counterpoint: The cost-benefit doesn’t stack up
As an operator with an existing deployed 4G LTE network, deploying a new 5G standalone network will not save you money.
From an capital perspective this is pretty obvious, you’re going to need to invest in a new RAN and a new core to support this, but what about from an opex perspective?
Claim: 5G RAN is more efficient than 4G (LTE) RAN
Spectrum is both finite and expensive, so MNOs must find the most efficient way to use that spectrum, to squeeze the most possible value out of it.
In rough numbers, we can say we get 5x the spectral efficiency by moving from 3G to 4G. This means we can carry 5.2x more with the same spectrum on 4G than we can on 3G – A very compelling reason to upgrade.
The like-for-like spectral efficiency of 5G is not significantly greater than that of LTE.
In numbers the same 5Mhz of spectrum we refarmed from UMTS (3G) to 4G (LTE) provided a 5x gain in efficiency to deliver 75Mbps on LTE. The same configuration refarmed to 5G-NR would provide 80Mbps.
Refarming spectrum from 4G (LTE) to 5G (NR) only provides a 6% increase in spectral efficiency.
While 6% is not nothing, if refarmed to a 5G standalone network, the spectrum can no longer be used by LTE only devices (Unless Dynamic Spectrum Sharing is used which in itself leads to efficiency losses), which in itself reduces the efficiency and would add additional load to other layers.
The crazy speeds demonstrated by 5G are not due to meaningful increases in efficiency, but rather the ability to use more spectrum, spectrum that operators need to purchase at auction, purchase equipment to utilize and pay to run.
Claim: 5G Standalone Core is Cheaper to operate as it is “Cloud Native”
It has been widely claimed that the shift for the 5G Core Architecture to being “Cloud Native” can provide cost savings.
Operators should regard this in a skeptical manner; after all, we’ve been here before.
Did moving from big-iron to VNFs provide the promised cost savings to operators?
For many operators the shift from hardware to software added additional complexity to the network and increased the headcount to support this.
What were once big-iron appliances dedicated to one job, that sat in the corner and chugged away, are now virtual machines (VNFs). Many operators have naturally found themselves needing a larger team to manage the virtual environment, compared to the size of the team they needed to just to plug power and data into a big box in an exchange before everything was virtualized.
Introducing a “Cloud Native” Kubernetes layer on top of the VNF / virtualization layer, on top of the compute layer, leaves us with a whole lot of layers. All of which require resources to be maintain, troubleshoot and kept running; each layer having associated costs for staffing, licensing and support.
Many mid size enterprises rushed into “the cloud” for the promised cost savings only to sheepishly admit it cost more than the expected.
Almost none of the operators are talking about running these workloads in the public cloud, but rather “Private Clouds” built on-premises, using “Cloud Native” best practices.
One of the central arguments about cloud revolves around “elastic scaling” where the network can automatically scale to match demand; think extra instances spun up a times of peak demand and shut down when the demand drops.
I explain elastic scaling to clients as having to move people from one place to another. Most of the time, I’m just moving myself, a push bike is fine, or I’ve got a 4 seater car, but occasionally I’ll need to move 25 people and for that I’d need a bus.
If I provide the transportation myself, I need to own a bike, a car and a bus.
But if use the cloud I can start with the push bike, and as I need to move more people, the “cloud” will provide me the vehicle I need to move the people I need to move at that moment, and I’ll just pay for the time I need the bus, and when I’m done needing the bus, I drop back to the (cheaper) push bike when I’m not moving lots of people.
While telecom operators are going to provide the servers to run this in “On-prem-cloud”, they need to dimension for the maximum possible load. This means they need to own a bike/car/bus, even if they’re not using it most of the time, and there’s really no cost savings to having a bus but not using it when you’re not paying by the hour to hire it.
Infrastructure aside, introducing a Standalone 5G Core adds another core network to maintain. Alongside the Circuit Switched Core (MSC/GGSN/SGSN) serving 2G/3G subscribers, Evolved Packet Core serving 4G (LTE) and 5G-NSA subscribers, adding a 5G Standalone Core to for the 5G-SA subscribers served by the 5G SA cells, is going to be more work (and therefore cost).
While the majority of operators have yet to turn off their 2G/3G core networks, introducing another core network to run in parallel is unlikely to lead to any cost savings.
Claim: Upgrading now can save money in the Future / Future Proofing
Life cycles of telecommunications are two fold, one is the equipment/platform life cycle (like the RAN components or Core network software being used to deliver the service) the other is the technology life cycle (the generation of technology being used).
The technology lifecycles in telecommunications are vastly longer than that for regular tech.
GSM (2G) was introduced into the UK in 1991, and will be phased out starting in 2033, a 42 year long technology life cycle.
No vendor today could reasonably expect the 5G hardware you deploy in 2024 to still be in production in 2066 – The platform/equipment life cycle is a lot shorter than the technology life cycle.
Operators will to continue relying on LTE (4G) well into the late 2030s.
I’d wager that there is not a single piece of equipment in the Vodafone UK GSM network today, that was there in 1991. I’d go even further to say that any piece of equipment in the network today, didn’t even replace the 1991 equipment, but was probably 3 or 4 generations removed from the network built in 1991.
For most operators, RAN replacements happen between 4 to 7 years, often with targeted augmentation / expansion as needed in the form of adding extra layers / sectors between these times.
The question operators should be asking is therefore not what will I need to get me through to 2066, but rather what will I need to get to 2030?
The majority of operators outside the US today still operate a 2G or 3G network, generally with minimal bandwidth to support legacy handsets and devices, while the 4G (LTE) network does most of the heavy lifting for carrying user traffic. This is often with the aid of an additional 5G-NSA (Non-Standalone) layer to provide additional capacity.
Is there a cost saving angle to adding support for 5G-Standalone in addition to 2G/3G/4G (LTE) and 5G (Non-Standalone) into your RAN?
A logical stance would be that removing layers / technologies (such as 2G/3G sunsetting) would lead to cost savings, and adding a 5G Standalone layer would increase cost.
All of the RAN solutions on the market today from the major vendors include support for both Standalone 5G and Non Standalone, but the feature licensing for a non-standalone 5G is generally cheaper than that for Standalone 5G.
The question operators should be asking is on what timescale do I need Standalone 5G?
If you’ve rolled out 5G-NSA today, then when are you looking to sunset your LTE network? If the answer is “I hope to have long since retired by that time”, then you’ve just answered that question and you don’t need to licence / deploy 5G-SA in this hardware refresh cycle.
Other Cost Factors
Roaming: The majority of roaming traffic today relies on 2G/3G for voice. VoLTE roaming is (finally) starting to establish a foothold, but we are a long way from ubiquitous global roaming for LTE and VoLTE, and even further away for 5G-SA roaming. Focusing on 5G roaming will enable your network for roaming use by a miniscule number of operators, compared to LTE/VoLTE roaming which covers the majority of the operators in the developed world who can utilize your service.
I decided to split this into 3 posts, next I’ll post the “5G can make us more money” post and finally a “5G because we have to” post. I’ll post that on LinkedIn / Twitter / Mailing list, so stick around, and feel free to trash me in the comments.
Slicing has long been held up as one of the monetizations opportunities for residential customers, but few seem to be familiar with it beyond a concept, so I thought I’d take a look at how it actually works in Android, and how an end user would interact with it.
For starters, there’s a little used hook in Android TelephonyManager called purchasePremiumCapability, this method can be called by a carrier’s self care app.
Operators would need the Telephony Permission for their app, and a function from the app in order to activate this, but it doesn’t require on Android Carrier Privileges and a matching signature on the SIM card, although there’s a lot of good reasons to include this in your Android Manifest for a Carrier Self-Care app.
We’ve made a little test app we use for things like enabling VoLTE, setting the APNs, setting carrier config, etc, etc. I added the Purchase Slice capability to it and give it a shot.
And the hook works, I was able to “purchase” a Slice.
I did some sleuthing to find if any self-care apps from carriers have implemented this functionality for standards-based slicing, and I couldn’t find any, I’m curious to see if it takes off – as I’ve written about previously slicing capabilities are not new in cellular, but the attempt to monetise it is.
Even before 5G was released, the arms race to claim the “fastest” speeds on LTE, NSA and SA networks has continued, with pretty much every operator claiming a “first” or “fastest”.
I myself have the fastest 5G network available* but I thought I’d look at how big the values are we can put in for speed, these are the Maximum Bitrate Values (like AMBR) we can set on an APN/DNN, or on a Charging Rule.
*Measurement is of the fastest 5G network in an eastward facing office, operated by a person named Nick, in a town in Australia. Other networks operated by people other than those named Nick in eastward facing office outside of Australia were not compared.
The answer for Release 8 LTE is 4294967294 bytes per second, aka 4295 Mbps 4.295 Gbps.
Not bad, but why this number?
The Max-Requested-Bandwidth-DL AVP tells the PGW the max throughput allowed in bits per second. It’s a Unsigned32 so max value is 4294967294, hence the value.
But come release 15 some bright spark thought we may in the not to distant future break this barrier, so how do we go above this?
The answer was to bolt on another AVP – the “Extended-Max-Requested-BW-DL” AVP ( 554 ) was introduced, you might think that means the max speed now becomes 2x 4.295 Gbps but that’s not quite right – The units was shifted.
This AVP isn’t measuring bits per second it’s measuring kilobits per second.
So the standard Max-Requested-Bandwidth-DL AVP gives us 4.3 Gbps, while the Extended-Max-Requested-Bandwidth gives us a 4,295 Gbps.
We add the Extended-Max-Requested-Bandwidth AVP (4295 Gbps) onto the Max-Requested Bandwidth AVP (4.3 Gbps) giving us a total of 4,4299.3 Gbps.
The Binding Support Function is used in 4G and 5G networks to allow applications to authenticate against the network, it’s what we use to authenticate for XCAP and for an Entitlement Server.
Rather irritatingly, there are two BSF addresses in use:
If the ISIM is used for bootstrapping the FQDN to use is:
bsf.ims.mncXXX.mccYYY.pub.3gppnetwork.org
But if the USIM is used for bootstrapping the FQDN is
bsf.mncXXX.mccYYY.pub.3gppnetwork.org
You can override this by setting the 6FDA EF_GBANL (GBA NAF List) on the USIM or equivalent on the ISIM, however not all devices honour this from my testing.
For the past few months I’ve had a Band 78 NR active antenna unit sitting next to my desk.
It’s a very cool bit of kit that doesn’t get enough love, but I thought I’d pop open the radome and take a peek inside.
Individual antenna elements
What I found very interesting is that it’s not all antennas in there!
… 29, 30, 31, 32. Yup. Checks out.
There are the expected number of antennas (I mean if I opened it up and found 31 antennas I’d have been surprised) but they don’t take up the whole volume of the unit, only about half,
AAU with Radome reinstalled
Well, after that strip show, back to sitting in my office until I need to test something 5G SA again…
So I’ve been waxing lyrical about how cool in the NRF is, but what about how it’s secured?
A matchmaking service for service-consuming NFs to find service-producing NFs makes integration between them a doddle, but also opens up all sorts of attack vectors.
Theoretical Nasty Attacks (PoC or GTFO)
Sniffing Signaling Traffic: A malicious actor could register a fake UDR service with a higher priority with the NRF. This would mean UDR service consumers (Like the AUSF or UDM) would send everything to our fake UDR, which could then proxy all the requests to the real UDR which has a lower priority, all while sniffing all the traffic.
Stealing SIM Credentials: Brute forcing the SUPI/IMSI range on a UDR would allow the SIM Card Crypto values (K/OP/Private Keys) to be extracted.
Sniffing User Traffic: A dodgy SMF could select an attacker-controlled / run UPF to sniff all the user traffic that flows through it.
Obviously there’s a lot more scope for attack by putting nefarious data into the NRF, or querying it for data gathering, and I’ll see if I can put together some examples in the future, but you get the idea of the mischief that could be managed through the NRF.
This means it’s pretty important to secure it.
OAuth2
3GPP selected to use common industry standards for HTTP Auth, including OAuth2 (Clearly lessons were learned from COMP128 all those years ago), however OAuth2 is optional, and not integrated as you might expect. There’s a little bit to it, but you can expect to see a post on the topic in the next few weeks.
3GPP Security Recommendations
So how do we secure the NRF from bad actors?
Well, there’s 3 options according to 3GPP:
Option 1 – Mutual TLS
Where the Client (NF) and the Server (NRF) share the same TLS info to communicate.
This is a pretty standard mechanism to use for securing communications, but the reliance on issuing certificates and distributing them is often done poorly and there is no way to ensure the person with the certificate, is the person the certificate was issued to.
3GPP have not specified a mechanism for issuing and securely distributing certificates to NFs.
Option 2 – Network Domain Security (NDS)
Split the network traffic on a logical level (VLANs / VRFs, etc) so only NFs can access the NRF.
Essentially it’s logical network segregation.
Option 3 – Physical Security
Split the network like in NDS but a physical layer, so the physical cables essentially run point-to-point from NF to NRF.
NRF and NF shall authenticate each other during discovery, registration, and access token request. If the PLMN uses protection at the transport layer as described in clause 13.1, authentication provided by the transport layer protection solution shall be used for mutual authentication of the NRF and NF. If the PLMN does not use protection at the transport layer, mutual authentication of NRF and NF may be implicit by NDS/IP or physical security (see clause 13.1). When NRF receives message from unauthenticated NF, NRF shall support error handling, and may send back an error message. The same procedure shall be applied vice versa. After successful authentication between NRF and NF, the NRF shall decide whether the NF is authorized to perform discovery and registration. In the non-roaming scenario, the NRF authorizes the Nnrf_NFDiscovery_Request based on the profile of the expected NF/NF service and the type of the NF service consumer, as described in clause 4.17.4 of TS23.502 [8].In the roaming scenario, the NRF of the NF Service Provider shall authorize the Nnrf_NFDiscovery_Request based on the profile of the expected NF/NF Service, the type of the NF service consumer and the serving network ID. If the NRF finds NF service consumer is not allowed to discover the expected NF instances(s) as described in clause 4.17.4 of TS 23.502[8], NRF shall support error handling, and may send back an error message. NOTE 1: When a NF accesses any services (i.e. register, discover or request access token) provided by the NRF , the OAuth 2.0 access token for authorization between the NF and the NRF is not needed.
TS 133 501 – 13.3.1 Authentication and authorization between network functions and the NRF
The Network Repository Function plays matchmaker to all the elements in our 5G Core.
For our 5G Service-Based-Architecture (SBA) we use Service Based Interfaces (SBIs) to communicate between Network Functions. Sometimes a Network Function acts as a server for these interfaces (aka “Service Producer”) and sometimes it acts as a client on these interfaces (aka “Service Consumer”).
For service consumers to be able to find service producers (Clients to be able to find servers), we need a directory mechanism for clients to be able to find the servers to serve their needs, this is the role of the NRF.
With every Service Producer registering to the NRF, the NRF has knowledge of all the available Service Producers in the network, so when a Service Consumer NF comes along (Like an AMF looking for UDM), it just queries the NRF to get the details of who can serve it.
Basic Process – NRF Registration
In order to be found, a service producer NF has to register with the NRF, so the NRF has enough info on the service-producer to be able to recommend it to service-consumers.
This is all the basic info, the Service Based Interfaces (SBIs) that this NF serves, the PLMN, and the type of NF.
The NRF then stores this information in a database, ready to be found by SBI Service Consumers.
This is achieved by the Service Producing NF sending a HTTP2 PUT to the NRF, with the message body containing all the particulars about the services it offers.
Simplified example of an SMSc registering with the NRF in a 5G Core
Basic Process – NRF Discovery
With an NRF that has a few SBI Service Producers registered in it, we can now start querying it from SBI Service Consumers, to find SBI Service Producers.
The SBI Service Consumer looking for a SBI Service Producer, queries the NRF with a little information about itself, and the SBI Service Producer it’s looking for.
For example a SMF looking for a UDM, sends a request like:
But networks evolve, and 5G Networks required some extensions to GTP to support these on the N9 and N3 reference points. (UPF to UPF and UPF to gNodeB / Access Network).
3GPP TS 38.415 outlines the PDU session user plane protocol used in 5GC.
The Need for GTP Header Extensions
As increasingly complex QoS capabilities are introduced into 5GC, there is a need to signal certain information on a per-packet basis.
The expansion of QoS in 5GC means the UPF of gNodeB may need to set the QoS Flow Identifier per-packet, include delay measurements or signal that Reflective QoS is being used per packet, for this, you need to extend GTP.
Fortunately GTP has support for Extension Headers and this has been leveraged to add the PDU Session Container in the Extension Header of a GTP packet.
In here you can set on a per packet basis:
QoS Flow Identifier (QFI) – Used to identify the QoS flow to be used (Pretty self explanatory)
Reflective QoS Indicator (RQI) – To indicate reflective QoS is supported for the encapsulated packet
Paging Policy Presence (PPP) – To indicate support for Paging Policy Indicator (PPI)
Paging Policy Indicator (PPI) – Sets parameters of paging policy differentiation to be applied
QoS Monitoring Packet – Indicates packet is used for QoS Monitoring and DL & UL Timestamps to come
UL/DL Sending Time Stamps – 64 bit timestamp generated at the time the UPF or UE encodes the packet
UL/DL Received Time Stamps – 64 bit timestamp generated at the time the UPF or UE received the packet
UL/DL Delay Indicators – Indicates Delay Results to come
UL/DL Delay Results – Delay measurement results
Sequence Number Presence – Indicates if QFI sequence number to come
UL/DL QFI Sequence Number – Sequence number as assigned by the UPF or gNodeB
Previous generations of core mobile network, would only allocate a single IP address per UE (Well, two if dual-stack IPv4/IPv6 if you want to be technical). But one of the cool features in 5GC is the support for Framed Routing natively.
You could do this on several EPC platforms on LTE, but it’s support was always a bit shoe-horned in, and the UE was not informed of the framed addresses.
If you’ve worked in a wireline ISP you’re probably familiar with the concept of framed routing already, in short it’s one or more static routes, typically returned from a AAA server (Normally RADIUS) that are then routed to the subscriber.
Each subscriber gets allocated an IP by the network, but other IPs can also be routed to the subscriber, based on the network and CIDR mask.
So let’s say we allocate a public IP of 1.2.3.4/32 to our subscriber, but our subscriber is a fixed-wireless user running a business and they want a extra public IP Addresses.
How do we do this? With Framed Routing.
Now in our UDM we can add a “Framed IP”, and when the SMF sets up a session for our subscriber, the extra networks specified in the framed routes will get routed to that UE.
If we add 203.176.196.0/30 in our UDM for a subscriber, when the subscriber attaches the UPF will be setup to forward traffic to 1.2.3.4/32 and also traffic to 203.176.196.0/30 to the UE.
Update: I previously claimed: Best of all this is signaled to the UE during the attach, so the UE is say a router, it becomes aware of the Framed IPs allocated to it. This is incorrect! Thanks to Anonymous Telco Engineer from an Anonymous Nordic Country for pointing this out, it is not signaled to the UE.
More info in 3GPP TS 23.501 section 5.6.14 Support of Framed Routing.
Reflective QoS is a clever new concept introduced in 5G SA networks.
The concept is rather simple, apply QoS in the downlink, and let the UE reply using the QoS in the uplink.
So what is Reflective QoS? If I send an ICMP ping request to a UE with a particular QoS Flow setup on the downlink, if Reflective QoS is enabled, the ICMP reply will have the same QoS applied on the uplink. Simple as that.
The UE looks at the QoS applied on the downlink traffic, and applies the same to the uplink traffic.
Let’s take another example, if a user starts playing an online game, and the traffic to the user (Downlink) has certain QoS parameters set, if Reflective QoS is enabled, the UE builds rules based on the incoming traffic based on the source IP / port / protocol of the traffic received, and the QoS used on the downlink, and applies the same on the uplink.
But actually getting Reflective QoS enabled requires a few more steps…
Reflective QoS is enabled on a per-packet basis, and is indicated by the UPF setting the Reflective QoS Indication (RQI) bit in the encapsulation header next to the QFI (This is set in the GTP header, as an extension header, used on the N3 and N9 reference points).
But before this is honored, a few other parameters have to be setup.
A Reflective QoS Timer (RQ Timer) has to be set, this can be done during the PDU Session Establishment, PDU Session Modification procedure, or set to a default value.
SMF has to set Reflective QoS Attribute (RQA) on the QoS profile for this traffic on the N2 reference point towards gNodeB
SMF must instruct UPF to use uplink reflective QoS by generating a new UL PDR for this SDF via the N4 reference point
When these requirements have been met, the traffic from the UPF to the gNodeB (N3 reference point) has the Reflective QoS Indication (RQI) bit in the encapsulation header, which is encapsulated and signaled down to the UE, which builds a rule based on the received IP source / port / protocol, and sends responses using the same QoS attributes.
Like in EPS / LTE, there are two ways to send SMS in Standalone 5G Core networks.
SMS over IMS or SMS over NAS – Both can be used on the same network, or just one, depending on operator preferences.
SMS over IMS in 5G
SMS over IMS uses the IMS network to send SMS. SIP MESSAGE methods are used to deliver SMS between users. While most operators have deployed IMS for 4G/LTE subscribers to use VoLTE some time ago, there are some changes required to the IMS architecture to support VoNR (Voice over New Radio) on the carrier side, and support for VoNR in commercial devices is currently in its early stages. Because of this many 5G devices and networks do not yet support SMS over IMS.
I’ve read in some places that RCS – The GSMA’s Rich Communications Service will replace SMS in 5GC. If this is the case, it reflected in any of the 3GPP standards.
SMS over NAS
To make a voice call on a device or network that does not support VoNR, EPS (VoLTE) fallback is used. This means when making or receiving a call, the UE drops from the 5G RAN to using a 4G (LTE) basd RAN, and then uses VoLTE to make the call the same as it would when connected to 4G (LTE) networks, because it is connected to a 4G network. This works technically, but is not the prefered option as it adds extra signaling and complexity to the network, and delays in the call setup, and it’s expected operators will eventually move to VoNR,but works as a stop-gap measure.
But mobile networks see a lot of SMS traffic. If every time an SMS was sent the UE had to rely on EPS fallback to access IMS, this would see users ping-ponging between 4G and 5G every time they sent or received an SMS.
5GC reintroduces the SMS-over-NAS feature, allowing the SMS messages to be carried over NAS messaging on the N1 interface. Voice calls may still require fallback to EPS (4G) to make calls over VoLTE, but SMS can be carried over NAS messaging, minimizing the amount of Inter-RAT handovers required.
The Nsmsf_SMService
For this a new Service Based Interface is introduced between the AMF and the SMSF (SMS Function, typically built into an SMSc), via the N20 / Nsmsf SBI to offer the Nsmsf_SMService service.
There are 3 operations supported for the Nsmsf_SMService:
Active – Initiated by the AMF – Used to active the SMS service for a given subscriber,
Deactivate – Initiated by the AMF – Used to deactivate the SMS over NAS service for a given subscriber.
UplinkSMS – Initiated by the AMF to transfer the SMS payload towards the SMSF.
The UplinkSMS is a HTTP post from the AMF with the SUPI in the Request URI and the request body containing a JSON encoded SmsRecordData.
Astute readers may notice that’s all well and good, but that only covers Mobile Originated (MO) SMS, what about Mobile Terminated (MT) SMS?
Well that’s actually handled by a totally different SBI, the Namf_Communication action “N1N2MessageTransfer” is resused for sending MT SMS, as that interface already exists for use by SMF, LMF and PCF, and 5GC attempts to reuse interfaces as much as possible.
There’s no such thing as a free lunch, and 5G is the same – services running through a 5G Standalone core need to be billed.
In 5G Core Networks, the SMF (Session Management Function) reaches out to the CHF (Charging Function) to perform online charging, via the Nchf_ConvergedCharging Service Based Interface (aka reference point).
Like in other generations of core mobile networks, Credit Control in 5G networks is based on 3 functions: Requesting a quota for a subscriber from an online charging service, which if granted permits the subscriber to use a certain number of units (in this case data transferred in/out). Just before those units are exhausted sending an update to request more units from the online charging service to allow the service to continue. When the session has ended or or subscriber has disconnected, a termination to inform the online charging service to stop billing and refund any unused credit / units (data).
Initial Service Creation (ConvergedCharging_Create)
When the SMF needs to setup a session, (For example when the AMF sends the SMF a Nsmf_PDU_SessionCreate request), the CTF (Charging Trigger Function) built into the SMF sends a Nchf_ ConvergedCharging_Create (Initial, Quota Requested) to the Charging Function (CHF).
Because the Nchf_ConvergedCharging interface is a Service Based Interface this is carried over HTTP, in practice, this means the SMF sends a HTTP post to http://yourchargingfunction/Nchf_ConvergedCharging/v1/chargingdata/
Obviously there’s some additional information to be shared rather than just a HTTP post, so the HTTP post includes the ChargingDataRequest as the Request Body. If you’ve dealt with Diameter Credit Control you may be expecting the ChargingDataRequest information to be a huge jumble of nested AVPs, but it’s actually a fairly short list:
The subscriberIdentifier (SUPI) is included to identify the subscriber so the CHF knows which subscriber to charge
The nfConsumerIdentification identifies the SMF generating the request (The SBI Consumer)
The invocationTimeStamp and invocationSequenceNumber are both pretty self explanatory; the time the request is sent and the sequence number from the SBI consumer
The notifyUri identifies which URI should receive subsequent notifications from the CHF (For example if the CHF wants to terminate the session, the SMF to send that to)
The multipleUnitUsage defines the service-specific parameters for the quota being requested.
The triggers identifies the events that trigger the request
Of those each of the fields should be pretty self explanatory as to their purpose. The multipleUnitUsage data is used like the Service Information AVP in Diameter based Credit Control, in that it defines the specifics of the service we’re requesting a quota for. Inside it contains a mandatory ratingGroup specifying which rating group the CHF should use, and optionally requestedUnit which can define either the amount of service units being requested (For us this is data in/out), or to tell the CHF units are needed. Typically this is used to define the amount of units to be requested.
On the amount of units requested we have a bit of a chicken-and-egg scenario; we don’t know how many units (In our case the units is transferred data in/out) to request, if we request too much we’ll take up all the customer’s credit, potentially prohibiting them from accessing other services, and not enough requested and we’ll constantly slam the CHF with requests for more credit. In practice this value is somewhere between the two, and will vary quite a bit.
Based on the service details the SMF has put in the Nchf_ ConvergedCharging_Create request, the Charging Function (CHF) takes into account the subscriber’s current balance, credit control policies, etc, and uses this to determine if the Subscriber has the required balances to be granted a service, and if so, sends back a 201 CREATED response back to the Nchf_ConvergedCharging_Create request sent by the CTF inside the SMF.
This 201 CREATED response is again fairly clean and simple, the key information is in the multipleQuotaInformation which is nested within the ChargingDataResponse, which contains the finalUnitIndication defining the maximum units to be granted for the session, and the triggers to define when to check in with CHF again, for time, volume and quota thresholds.
And with that, the service is granted, the SMF can instruct the UPF to start allowing traffic through.
Update (ConvergedCharging_Update)
Once the granted units / quota has been exhausted, the Update (ConvergedCharging_Update) request is used for requesting subsequent usage / quota units. For example our Subscriber has used up all the data initially allocated but is still consuming data, so the SMF sends a Nchf_ConvergedCharging_Update request to request more units, via another HTTP post, to the CHF, with the requested service unit in the request body in the form of ChargingDataRequest as we saw in the initial ConvergedCharging_Create.
If the subscriber still has credit and the CHF is OK to allow their service to continue, the CHF returns a 200 OK with the ChargingDataResponse, again, detailing the units to be granted.
This procedure repeats over and over as the subscriber uses their allocated units.
Release (ConvergedCharging_Release)
Eventually when our subscriber disconnects, the SMF will generate a Nchf_ConvergedCharging_Release request, detailing the data the subscriber used in the ChargingDataRequest in the body, to the CHF, so it can refund any unused credits.
The CHF sends back a 204 No Content response, and the procedure is completed.
More Info
If you’ve had experience in Diameter credit control, this simple procedure will be a breath of fresh air, it’s clean and easy to comprehend, If you’d like to learn more the 3GPP specification docs on the topic are clear and comprehensible, I’d suggest:
TS 132 290 – Short overview of charging mechanisms
TS 132 291 – Specifics of the Nchf_ConvergedCharging interface
The common 3GPP charging architecture is specified in TS 32.240
TS 132 291 – Overview of components and SBIs inc Operations
Today, we’re going to look at one of the simplest Service Based Interfaces in the 5G Core, the Equipment Identity Register (EIR).
The purpose of the EIR is very simple – When a subscriber connects to the network it’s Permanent Equipment Identifier (PEI) can be queried against an EIR to determine if that device should be allowed onto the network or not.
The PEI is the IMEI of a phone / device, with the idea being that stolen phones IMEIs are added to a forbidden list on the EIR, and prohibited from connecting to the network, making them useless, in turn making stolen phones harder to resell, deterring mobile phone theft.
In reality these forbidden-lists are typically either country specific or carrier specific, meaning if the phone is used in a different country, or in some cases a different carrier, the phone’s IMEI is not in the forbidden-list of the overseas operator and can be freely used.
The dialog goes something like this:
AMF: Hey EIR, can PEI 49-015420-323751-8 connect to the network?
EIR: (checks if 49-015420-323751-8 in forbidden list - It's not) Yes.
or
AMF: Hey EIR, can PEI 58-241992-991142-3 connect to the network?
EIR: (checks if 58-241992-991142-3 is in forbidden list - It is) No.
(Optionally the SUPI can be included in the query as well, to lock an IMSI to an IMEI, which is a requirement in some jurisdictions)
As we saw in the above script, the AMF queries the EIR using the N5g-eir_EquipmentIdentityCheck service.
The N5g-eir_EquipmentIdentityCheck service only offers one operation – CheckEquipmentIdentity.
It’s called by sending an HTTP GET to:
http://{apiRoot}/n5g-eir-eic/v1/equipment-status
Obviously we’ll need to include the PEI (IMEI) in the HTTP GET, which means if you remember back to basic HTTP GET, you may remember means you have to add ?attribute=value&attribute=value… for each attribute / value you want to share.
For the CheckEquipmentIdentity operation, the PEI is a mandatory parameter, and optionally the SUPI can be included, this means to query our PEI (The IMSI of the phone) against our EIR we’d simply send an HTTP GET to:
AMF: HTTP GET http://{apiRoot}/n5g-eir-eic/v1/equipment-status?pei=490154203237518
EIR: 200 (Body EirResponseData: status "WHITELISTED")
And how it would look for a blacklisted IMEI:
AMF: HTTP GET http://{apiRoot}/n5g-eir-eic/v1/equipment-status?pei=490154203237518
EIR: 404 (Body EirResponseData: status "BLACKLISTED")
Because it’s so simple, the N5g-eir_EquipmentIdentityCheck service is a great starting point for learning about 5G’s Service Based Interfaces.
Imagine a not-too distant future, one without flying cars – just one where 2G and 3G networks have been switched off.
And the imagine a teenage phone user, who has almost run out of their prepaid mobile data allocation, and so has switched mobile data off, or a roaming scenario where the user doesn’t want to get stung by an unexpectedly large bill.
In 2G/3G networks the Circuit Switched (Voice & SMS) traffic was separate to the Packet Switched (Mobile Data).
This allowed users to turn of mobile data (GPRS/HSDPA), etc, but still be able to receive phone calls and send SMS, etc.
With LTE, everything is packet switched, so turning off Mobile Data would cut off VoLTE connectivity, meaning users wouldn’t be able to make/recieve calls or SMS.
In 3GPP Release 14 (2017) 3GPP introduced the PS Data Off feature.
This feature is primarily implemented on the UE side, and simply blocks uplink user traffic from the UE, while leaving other background IP services, such as IMS/VoLTE and MMS, to continue working, even if mobile data is switched off.
The UE can signal to the core it is turning off PS Data, but it’s not required to, so as such from a core perspective you may not know if your subscriber has PS Data off or not – The default APN is still active and in the implementations I’ve tried, it still responds to ICMP Pings.
IMS Registration stays in place, SMS and MMS still work, just the UE just drops the requests from the applications on the device (In this case I’m testing with an Android device).
What’s interesting about this is that a user may still find themselves consuming data, even if data services are turned off. A good example of this would be push notifications, which are sent to the phone (Downlink data). The push notification will make it to the UE (or at least the TCP SYN), after all downlink services are not blocked, however the response (for example the SYN-ACK for TCP) will not be sent. Most TCP stacks when ignored, try again, so you’ll find that even if you have PS Data off, you may still use some of your downlink data allowance, although not much.
The SIM EF 3GPPPSDATAOFF defines the services allowed to continue flowing when PS Data is off, and the 3GPPPSDATAOFFservicelist EF lists which IMS services are allowed when PS Data is off.
Usually at this point, I’d include a packet capture and break down the flow of how this all looks in signaling, but when I run this in my lab, I can’t differentiate between a PS Data Off on the UE and just a regular bearer idle timeout… So have an irritating blinking screenshot instead…
Want more telecom goodness?
I have a good old fashioned RSS feed you can subscribe to.