Category Archives: EPC

Tales from the Trenches – Gx over Gy?

06/06/2026EPC, GSM, History, LTE, Mobile Networks, Notes, RFCs & Standards, SDMDiameter, EPC, Gx, Gx over Gy, GyNick

I was recently asked by a potential customer if we supported Gx over Gy.

I’d never heard of this before, so I gave my standard “If it’s in the spec we should support it, but I’ll check” answer, and got them to send me a PCAP, which I’ve got.

This is weird.

So for starers, Protocoldex has nothing for this application ID (16777225), even though it has all the LTE diameter specs.

My starting point was TS 29.230 TS Diameter applications; 3GPP specific codes and identifiers which acknowledged the existence of “Gx over Gy” with IANA code 16777225 and pointed me to TS 29.210 which is a 3G spec (which is not a LTE / 4G spec).

The last version was from 2006, in 3GPP release 6, which is two years before LTE was standardized in Release 8. The word LTE does not appear in the doc or in the metadata tags.

It speaks of TPF (Traffic Plane Function) and TPF (Charging Rules Function).

LTE is “Long Term Evolution” – In later releases this draft TPF would evolve into the PGW (before the PGW-C / PGW-U divorce) and the TPF would go on to become the PCRF (and save spring break).

Reading through these early specs is like looking at Homo Eructs (get your mind out of the gutter) and knowing it evolves into Homo Sapiens.

So what does Gx over Gy do? Well, the concept is pretty straightforward, rather than needing a Sy interface between the PCRF and OCS, you can provision policy rules from the OCS, rather than on the PCRF.

Of course, you could never run VoLTE on this (the P-CSCF needs the Rx interface to the PCRF to provision dedicated bearers and the PCRF provisions those over TFTs over Gx interface).

So what network functions should implement this standard? Well, the P-GW specs do not reference this as something that’s included in the P-GW, nor is it in the GGSN – This was a “gooch” spec between the hypothetical standards land and real world implementations.

So will we be implementing it? Probably not. But an interesting bit of archaeology and a look through the genealogy of 3GPP.

PFCP and SIP Redirect

30/05/20265G SA, EPC, LTE, Mobile Networks, RF, RFCs & Standards, SDMCharging, Diameter, IMS, OCS, Online Charging, PFCP, Redirect-Information, SIPNick

There’s a cool feature in PFCP that allows you to redirect traffic, which I’ve written about before.

But there’s a funky thing that’s left me scratching my head, in the Redirect information IE, you can set a SIP URI.

That’d be great and all, but PFCP is all about packets not about calls.

So what’s the deal?

Had I uncovered some Machiavellian plot to move channel-associated-signaling onto PFCP instead of TDM links as God intended?

Well, no…

The Redirect Information in PFCP comes from the Redirect Information in Diameter, that’s how your OCS can tell your SMF or your PGW-C (or your TAS) – hey this session is all out of usage, and should be redirected.

Of course, PFCP is just all about packets, but Diameter has a foot in both camps, Gy and Ro are both on Diameter.

So when the 3GPP specced this IE, they just copied the encoding from the Redirect Address Type AVP in Diameter charging base, which has support for calls.

I can put down my pitchfork and go and hug my E1 links knowing they’re safe.

Walled Gardens & Redirection in 4G/5G (RedirectInformation)

23/05/20265G SA, EPC, LTE, Mobile Networks, Notes, RFCs & Standards5G, LTE, PFCP, Redirect-Information, SANick

PFCP includes a “Redirect Information” IE, which if set, allows you to change the forwarding action in PFCP to Redirect traffic.

We use this for walled garden redirects, when the OCS reports credit exhausted to the PGW-C, the PGW-C can tell the UPF (PGW-U) that all the traffic from a given subscriber should be redirected to a captive portal / walled garden, like a “Topup Now Page” you’d be used to seeing on Airport WiFi.

“Sign in to network” prompt presented on Cellular

Here’s what the spec says:

8.37. Redirect-Server AVP The Redirect-Server AVP (AVP Code 434) is of type Grouped and contains the address information of the redirect server (e.g., HTTP redirect server, SIP Server) with which the end user is to be connected when the account cannot cover the service cost. It MUST be present when the Final-Unit-Action AVP is set to REDIRECT. It is defined as follows (per the grouped-avp-def of RFC 3588 [DIAMBASE]): Redirect-Server ::= < AVP Header: 434 > { Redirect-Address-Type } { Redirect-Server-Address }

So how does this work in practice?

Once upon a time, you’d just intercept all HTTP request and serve your own content, but it’s not 2005 on Starbucks WiFi anymore, and SSL is everywhere.

Luckily this is a (mostly) solved problem, Apple has “Captive Network Assistant” that probes http://captive.apple.com/hotspot-detect.html and checks for a specific response, Google’s Android has http://connectivitycheck.gstatic.com/generate_204 and does the same thing.

There is a draft RFC to do this better, but it’s not widely adopted.

But before I can tell you what we do, I’ll show you what we’re not doing before we do the doing so you can see what the do does by looking at what happens when we don’t – Clear?

Before we send any Session Modification Request with redirect I can do a DNS lookup, here’s an example from our test jig that goes to Facebook:

A Record lookup for `facebook.com` resolving to `57.145.8.1`

This is just a regular A record DNS query wrapped up in GTP-U as it’d look from a eNB/gNB/SGW that gets an answer back also in GTP-U.

As we’ve already got a session up in our case, the SMF or PGW-C we sends the PFCP Session Modification Request I shared in the screenshot earlier to the UPF.

The Redirect Server Address in the Redirect Information IE in PFCP

We do a few things on the UPF at this point, the first, is that we block forwarding access to all IPs except 10.179.2.135 (The redirect server in the screenshot), and we steal / intercept all DNS queries.

This means if you query facebook.com after the Redirect Information is in place, you get back an A-Record answer for facebook.com but it’s telling you Facebook lives on our redirect server.

We’ve got a whitelist on our UPF for certain domains, so if we’re sending you to a self-signup page, you’re going to need to be able to hit our payment processors portals (Stripe, Paypal, etc), so we need to allow their domains, but we don’t know their IPs, so instead we do server side DNS lookups (via our DNS servers before you sneaky kids get any other ideas) for the whitelsited domains, and if it’s on our DNS whitelist, we allow resolution to those domains and allow access to those IPs returned in the DNS response.

In my lab I’m redirecting HTTP traffic to a management server

Turning it off just involves sending another PFCP Session Modification Request but without the redirect information.

Once this is set we’re back resolving addresses.

Somebody’s watching me – Adventures in Cellular Location services

20/03/20265G SA, EPC, EUTRAN, IMS / VoLTE, LTE, Mobile Networks, Security5G, EPC, Location, Lpp, LTENick

Preface: I build cellular networks for a job.

We support a network in Alaska, and one of the guys we work with there – John – has a story (which I’ll steal here) where he gets a phone call late at night from someone saying they’re in the US Air Force, and uh, they’ve, uh, lost a plane. And since John works for the phone company, he wouldn’t have any idea where it is would you? They ask him.

As a matter of fact, John could see the last cell the SIM the pilot was carrying was attached to, they sent a helicopter out and found the pilot, who survived.

This was a long time ago, and he was able to pin the location down to a cell (sector), and lookup which direction the sectors were pointing for that cell and the location of it, to give a pretty good idea of the general search area.

Now that everyone carries a GPS in their pockets, the level of accuracy here is a lot more than just which cell are you served by (although that’s a lot of accuracy anyway, and not to be ignored).

There’s significant privacy implications here and a lot of misinformation about pinging cell towers and “zoom enhance” stuff.

I figured I’d actually share how this works IRL – There’s nothing ‘secret‘ here – All of this stuff is in the 3GPP standards which outline how mobile networks should behave.

I’ve written a precursor to this a few years ago – And the call was coming from… INSIDE THE HOUSE. A look at finding UE Locations in LTE.

Location Sources & Accuracy

There’s roughly 4 levels of accuracy in cell phone networks, we’ll cover each one, and how the network treats it.

(I’m talking 4G/5G here as most of the world has moved on or is already moving on from 2G/3G)

Tracking Area Level Accuracy

Cell sites get grouped into tracking areas, they’re kinda like broadcast domains in TCP/IP networking, when you need to “page” (find) a phone that’s “idle” (sleeping) you page the tracking area.

Tracking Area sizing has sweet spots, you want more than a few cells, commonly about a dozen or so in the same geographic area get lumped into the same tracking area. In regional areas you might have a large geographic area – Up to a few hundred Km in regional Australia for example, lumped into a single tracking area, whereas in a city that might be a single city block.

If you move between cells inside the same tracking area, then your phone doesn’t need to say to the network “hey I’m moving cells” – It’s only if we go over to a new tracking area that the phone needs to wake up and tell the network it’s now in this new tracking area.

(If you’ve got a tracking area that’s too big (too many cells) then it becomes a nightmare to find who you’re looking for, as the paging channels are always blaring out IDs, tracking areas too small and you’ve got phones having to constantly say “hey I’m moving to this tracking area now” – If you want to learn more about Tracking Areas I’ve written about them on the blog before)

The core network (MME/AMF) always knows the location of a phone at minimum to the tracking area – It’s the base level of location the network has to work with.

Cell ID Level Accuracy (CGI / E-CGI)

Every cell site sector (cell) has a unique ID to denote which carrier you’re connected to. If you’ve got a 3 sector site, with a single layer per cell sector, then that’s 3 Cell Global Identifiers (CGIs) – one for each sector.

Here’s a tower we put up recently, the CGIs I’ve drawn on are just examples, but if you’re connected to the sector facing North, you’d have CGI of 111, if you’re connected to the cell to the south east, you’d have 112, and the one to the south west would be 113.

CGIs are just numbers, they could be any number, all that matters is that number is unique (ish) in the network, they don’t need to be sequential, or have any common digits.

If we know the CGI of a given user we can kinda draw a 1/3rd wedge off the side of the tower in the direction the antenna is pointing, and if you’re inside that wedge, and that tower is still providing coverage, then we know the customer is somewhere inside that wedge.

But those wedges can still be large, so the margin of error for locating someone is still pretty large. You can probably answer the question of “Are they in the office or are they at home” if they’re in different suburbs.

There was a recent case of a misconfigured Mavenir IMS in the O2 network in the UK that was leaking CGI information on calls between two parties, as the SIP messages contained this and were not getting stripped before being passed to the B-Party.

When the network wants to know a bit more about where the phone is located, it can ask the cell site which Global Cell ID the phone is in, this is pretty rare, but can be done. When the phone is actively doing stuff, like making a call, using data or sending a text, the network knows the CGI of the event.

My lab is setup with CGI 4000 and TAC 100, and this information is littered across every signaling message.

Note: The encoding shows up as 0000 0000 0000 0000 1111 1010 0000 …. = cell-ID: 0x0000fa0 for CGI 4000, just roll with it, the spec explains why this is.

A SIP REGISTER message from my lab, showing the CGI (*00640000fa0*)

GNSS LPP Positioning

When the Cell ID level is not accurate enough, the network can request the phone to provide it’s location, using whatever it’s got available to it.

In reality, this is either done by an engineer from the phone company with the permissions to do so, or directly by law enforcement using the SLh/SLg Diameter interfaces.

When an engineer does it, there’s usually a portal they can go to, like this one in OmniMME, they search the IMSI or MSISDN, and then can get the location information via a variety of methods.

Your phone gets a message from the network, that says “Hey phone, tell me where you are”.

If you’ve got access to RRC messaging / NAS messaging on your phone through QXDM / Diag mode – You can see these requests.

If you’ve got enough access to the baseband you can even block these requests should you feel so inclined.

I’ve included some Wireshark captures of how this actually looks and how it looks from the Web UI of the MME, with the address removed.

OTDOA – “Pinging”

Sometimes you don’t get an indoor location with GPS or the phone might be too old to support LPP Positioning, no GPS built in or something.

In those scenarios, we use “Time Difference of Arrival” to calculate the position by measuring time between 2 or more cell sites, and calculating the time between when a signal was sent to a phone, and when it receives it, to calculate distance from the base station.

This is better than CGI as it gives you an idea of how far from the cell site the phone is, and the cell site, but it doesn’t return a map with “you are here”, but rather some rough distances, and CGIs for each cell it can see.

The engineer then pulls up a map of all the cell sites, finds the CGIs the cell phone can see, headings for each CGI and tries to do some early high school maths like someones life actually depends on it.

Tales from the Trenches – PGW-C Deleting Sessions

06/03/2026EPC, GSM, LTE, Mobile Networks, NotesNick

One of our customers is an MVNE and they reached out the other day with an issue.

They were turning up a new PGW and they’d see Create Session Request, everything looked OK, it’d get a response, but then in the GUI of the PGW-C they’d see the session drop.

The logs showed the newly setup session dropping shortly after being setup.

Have a look at the screenshot and see if you can work out why:

So what’s going on, and why is the PGW-C deleting sessions?

The initial reaction from the customer was there’s something up with the PGW, but the answer is bit more nuanced.

Per the specs, you can’t have two PDN sessions for the same subscriber (IMSI) on the same APN (DNN).

So if 50557000000001 is connected to the PGW-C on the internet APN, if I send another Create Session Request to the same PGW-C, it deletes the old session, before starting the new one.

In this case, the MVNE it was going through was dropping the Create Session Response, so it never made it back to the MNO, and then the MME in the MNO sent it again.

Joys of GTPv2-C being UDP based and connectionless!

Packet Buffering in the UPF

20/02/20265G SA, EPC, LTE, Mobile Networks5GC, EPC, GTP, LTE, PFCPNick

When a UE enters Idle mode, the network releases radio resources and the UE enters power saving mode.

When the UE wants to send data (Uplink) the UE just tells the network “hey I want to send something” and away it goes, nice and simple.

But when the network wants to send data to the UE (Downlink) then the UPF needs a method to tell the Control Plane (SGW-C or SMF) that there’s data waiting and to go and page the UE.

A prime example of this is when you’ve got a Mobile Terminated VoLTE call coming in, you need a way to tell the UE to wake up out of Idle mode because you’ve got something to send to it (a SIP INVITE).

But in order for this to work, we can’t just say “Hey I’ve got some packets for you” and let them get dropped, the UPF also needs to buffer (store temporarily) the downlink packets for the UE until the UE comes out of Idle mode, and then flush them out to deliver them to the UE.

So let’s look at the flow.

Enabling Buffering (Idle Mode)

When the sub enters idle mode, the Control Plane (SGW-C for an EPC or SMF for a 5GC) it sends a Session Modification Request but with the BUFF (Buffer) and NOCP (Notify Control Plane) flags set, and FORW (Forward) turned off.

What this means is now for packets to that bearer, the UPF must:

Not forward any traffic
Buffer the traffic
Notify the control plane when the first packet comes in that we buffer

Then the UPF just sits and waits for any incoming packets.

The Notify

When the UE gets an incoming packet that it’s supposed to buffer and notify, well, it does just that.

The packets are copied into a buffer, in sequence, and for the first packet, the UPF must send a notification to the Control Plane.

That looks like this, it’s just a Session Report Request with the Dowlink Data Report flag.

Now the SMF/SGW-U sends back a Session Report Response and starts the process of paging the UE.

At the same time the UPF keeps buffering – It’s work is not done.

Flushing and Forwarding

Once the UE has become reachable, the Control Panel needs to modify the bearer to turn back on forwarding. It does through another Session Modification Request, this is the inverse of the one it sent to turn on buffering, as we’re turning off buffering and notifications, and turning on forwarding.

Now the UPF flushes it’s buffer – It’ll send all the packets that were queued up out over the wire towards the gNB / eNodeB, so the SIP INVITE for the MT call or whatever will make it through.

One thing to note is that the packets that get buffered are going to take some time to get delivered, as the NOTIFY / page UE / reconnect UE / Session Modification Request (to enable forwarding again) needs to happen before the buffers are flushed and delivered.

Notice the latency spike on the first packet? 610ms? That’s because the UE had to be paged to wake up.

And that’s pretty much it, the UPF has now flushed it’s buffers and moves back to forwarding actions.

Stupid Mistakes – New UPF and IMS

23/01/20265G SA, EPC, IMS / VoLTE, LTE, Mobile Networks, Notes, VoIPEPC, GTP, IMS, LTE, SIP, UPF, VoIP, VoLTENick

Our team recently shipped a new UPF which is a huge improvement on our old UPF, and I drew the short straw of doing all the interop testing for the IMS.

Initially I thought there was an issue with IP routing, as I’d never see the SIP register from the UE, but I would see the IMS APN coming up.

I could access the internet from the UE IPs just fine, but that’s going to public IPs, whereas the P-CSCF is in private address space, and hosted on the same box as the UPF.

I spent hours on this as my lab servers do routing on a stick, and I thought some hardware offload somewhere was trying to fast path my packets and send them back to the server without going via the router.

Then I dug a little deeper and found I could see the 3 way handhake between the UE an the P-CSCF, but no SIP packets.

Successful 3 way handshake between the UE and the P-CSCF on TCP 5060

This was confusing, clearly we had at least intermittent two way comms – the 3 way TCP handshake confirmed that, but then why were packets not getting across?

We have an XCAP server hosted on our P-CSCF instances, so I tried hitting that from the phone in case there was something weird about routing to the network segment that hosts the P-CSCF, but I could hit the XCAP server just fine, so now I was certain the UE IP pool could route to the P-CSCF and 3 way handshake for TCP was working and payload could be pushed.

Clearly we can route to the P-CSCF as that’s where this XCAP server is hosted

Then I dug into what happened after the 3 way handshake, and I found a TCP payload containing the start of the SIP REGISTER.

Hmm, we have a SIP Fragment here at least…

I traced it all the way through and lo, it’s hitting the P-CSCF:

And the fragment is recieved on the P-CSCF

Okay, but then what happens, because it’s only a fragment, not the complete re-assembled packet, so what’s going on?

Well, the P-CSCF sends a TCP ACK back to the UE.

And the TCP fragment containing the first part of the REGISTER gets an ACK back from the P-CSCF

The ACK gets forwarded to the UPF:

And then… Nothing? The UPF never encaps the TCP ACK back into GTP-U and never sends it onto base station.

Eventually the UE re-sends the payload with the start of the REGISTER, but it does not get the ACK from the P-CSCF.

Retransmitted TCP segment containing the REGISTER from the UE

So naughty UPF right? Not forwarding that ACK for some reason?

I started digging, maybe the ACK was getting routed weirdly and landing on the UPF without going through the router?

Well not quite…

When I started digging into the QER rules being installed I noticed the MBR bitrate we had on the IMS APN in the HSS was tiny.

The UPF can only gate on traffic to the UE, so was gating the ACK traffic, as the QER had consumed all the bandwidth so the ACK never made it back.

Time wasted – About 4 hours, but I will not make this mistake again!

Tales from the Trenches – Samsung mystery packet on IMS Call

05/09/20255G SA, EPC, IMS / VoLTE, LTE, Mobile Networks, VoIPEPC, GTP, IMS, PFCP, SIP, VoIP, VoLTENick

So we’ve found this scenario that occurs on some Samsung UEs, in certain radio contions, where midway through an otherwise normal voice call, the UE sends “mystery” data (Not IP data), which in turn causes the UPF to send the error indication and drop the bearer, which in turn drops the call.

The call starts, like any normal call, SIP REGISTER, INVITE, etc.

The P-CSCF / PCRF / PGW set up the dedicated bearer for the voice traffic, and the RTP stream starts flowing over it.

Then the UE sends these weird packets instead of the RTP stream:

These are GTP-U encapsulated data, with the TEID that matches the TEID used for the RTP stream, but there’s no IP data in them – they’re only 14 bytes long and sent by the UE.

Here’s some examples of what’s sent (each line is a packet):

d097ab7d605665f4f8f7cf7805c0
...
d198c445614c64f4f8f7cf7805c0
...
d298fd4d624263f4f8f7cf7805c0
...
d398be55633862f4f8f7cf7805c0
...
d498f45d642e61f4f8f7cf7805c0

Per 3GPP you can’t transmit anything other than IPv4 or IPv6 unless you’re using NIDD, but this is not using NIDD or Ethernet.

An IPv4 header is 20 bytes long, and IPv6 header is 40, so this is too short for either of those protocols, but what else could it be?

There’s some commonality of course, starts d0 as the first octet, then d1, d2, d3, etc. So that’s something?

I thought perhaps it was a boundary issue, that the standard RTP packet was being split across multiple GTP-U payloads, but that doesn’t appear to be the case.

An Ethernet header is 14 bytes, but if we were to decode this as Ethernet there’s still nothing it’s transporting, and the destination MAC is changing sequentailly if that’s the case, which would be even weirder.

I also thought about RTP that for some reason has lost it’s IP/UDP header, as the sequentially counting byte at the start could be the RTP sequence number, but that’d be 19 bytes minimum and the sequence number is the 3rd and 4th byte, not the first.

Whatever they contain, we see this sent over and over for a few seconds, then bam, back to normal RTP stream flowing.

Or at least it should be, but the invalid packet causes the UPF to generate a GTP-U Error Indication.

These Error Indication payloads eventually lead to the next PFCP Session Report Request having the Error Indication Report (ERIR) flag set to True.

When the PGW-U gets this, it sends a Session Delete Request, which dutifully drops the bearer.

Meaning the session drops on the EPC side, and the RTP drops with it, eventually a BYE is sent from the phone due to RTP timeout.

The above screenshot shows a different cause of GTP-U Error Indication – At this point the bearer has been dropped on the EPC side and these are Error Indications to report it doesn’t know the TIED / bearer.

How to fix this?

Well, unlikely we’ll get a fix on the Samsung side, so we’ll need to not drop the bearer on the PGW-C if we get a lot of Error Indications, and hope for the best.

GTPv2 Instance IDs

18/04/20255G SA, EPC, LTE, Mobile Networks, Notes, RFCs & StandardsEPC, GTP, GTP-C, GTPv2, GTPv2C, LTENick

I was diffing two PCAPs the other day trying to work out what’s up, and noticed the Instance ID on a GTPv2 IE was different between the working and failing examples.

So what does it denote, well from TS 129.274:

If more than one grouped information elements of the same type, but for a different purpose are sent with a message,
these IEs shall have different Instance values.

So if we’ve got two IEs of the same IE type (As we often do; F-TEIDs with IE Type 87 may have multiple instances in the same message each with different F-TEID interface types), then we differentiate between them by Instance ID.

The only exception to this rule is where we’ve got the same data, so if you’ve got one IE with the exact same values and purpose that exists twice inside the message.

GTPv2 Source Ports

24/01/2025EPC, LTE, Mobile Networks, Notes, RFCs & StandardsGTP, GTP-C, GTPv2CNick

Ask anyone in the industry and they’ll tell you that GTPv2-C (aka GTP-C) uses port 2123, and they’re right, kinda.

Per TS 129.274 the Create Session Request should be sent to port 2123, but the source port can be any port:

The UDP Source Port for a GTPv2 Initial message is a locally allocated port number at the sending GTP entity.

So this means that while the Destination Port is 2123, the source port is not always 2123.

So what about a response to this? Our Create Session Response must go where?

Create Session request coming from 166.x.y.z from a random port 36225
Going to the PGW on 172.x.y.z port 2123

The response goes to the same port the request came on, so for the example above, as the source port was 36225, the Create Session Response must be sent to port 36225.

Because:

The UDP Destination Port value of a GTPv2 Triggered message and for a Triggered Reply message shall be the value of the UDP Source Port of the corresponding message to which this GTPv2 entity is replying, except in the case of the SGSN pool scenario.

But that’s where the association ends.

So if our PGW wants to send a Create Bearer Request to the SGW, that’s an initial message, so must go to port 2123, even if the Create Session Request came from a random different port.

TFTs & Create Bearer Requests

27/12/20245G SA, EPC, IMS / VoLTE, LTE, Mobile Networks, Notes, RFCs & StandardsCharging Rule, Diameter, EPC, Gx, TFT, Traffic Flow TemplateNick

What is included in the Charging Rule on Gx ultimately turns into a Create Bearer Request on GTPv2-C.

But the mapping it’s always obvious, today I got stuck on the difference between a Single remote port type, and a single local port type, thinking that the Packet Filter Direction in the TFT controlled this – It doesn’t – It’s controlled by the order of your Traffic Flow Template rule.

Input TFT:

"permit out 17 from any 50000 to any"

Leads to Packet filter component type identifier: Single remote port type

Whereas a TFT of:

permit out 17 from any to any 50000

Leads to Packet filter component type identifier: Single local port type (64)

5G / LTE Milenage Security Exploit – Dumping the Vectors

23/08/20245G SA, EPC, LTE, Mobile Networks, SDM, Security, SIM Cards5G, Diameter, HSS, LTE, SecurityNick

I’ve written about Milenage and SIM based security in the past on this blog, and the component that prevents replay attacks in cellular network authentication is the Sequence Number (Aka SQN) stored on the SIM.

Think of the SQN as an incrementing odometer of authentication vectors. Odometers can go forward, but never backwards. So if a challenge comes in with an SQN behind the odometer (a lower number), it’s no good.

Why the SQN is important for Milenage Security

Every time the SIM authenticates it ticks up the SQN value, and when authenticating it checks the challenge from the network doesn’t have an SQN that’s behind (lower than) the SQN on the SIM.

Let’s take a practical example of this:

The HSS in the network has SQN for the SIM as 8232, and generates an authentication challenge vector for the SIM which includes the SQN of 8232.
The SIM receives this challenge, and makes sure that the SQN in the SIM, is equal to or less than 8232.
If the authentication passes, the new SQN stored in the SIM is equal to 8232 + 1, as that’s the next valid SQN we’d be expecting, and the HSS incriments the counters it has in the same way.

By constantly increasing the SQN and not allowing it to go backwards, means that even if we pre-generated a valid authentication vector for the SIM, it’d only be valid for as long as the SQN hasn’t been authenticated on the SIM by another authentication request.

Imagine for example that I get sneaky access to an operator’s HSS/AuC, I could get it to generate a stack of authentication challenges that I could use for my nefarious moustache-twirling purposes whenever I wanted.

This attack would work, but this all comes crumbling down if the SIM was to attach to the real network after I’ve generated my stack of authentication challenges.

If the SQN on the SIM passes where it was when the vectors were generated, those vectors would become unusable.

It’s worth pointing out, that it’s not just evil purposes that lead your SQN to get out of Sync; this happens when you’ve got subscriber data split across multiple HSSes for example, and there’s a mechanism to securely catch the HSS’s SQN counter up with the SQN counter in the SIM, without exposing any secrets, but it just ticks the HSS’s SQN up – It never rolls back the SQN in the SIM.

The Flaw – Draining the Pool

The Authentication Information Request is used by a cellular network to authenticate a subscriber, and the Authentication Information Answer is sent back by the HSS containing the challenges (vectors).

When we send this request, we can specify how many authentication challenges (vectors) we want the HSS to generate for us, so how many vectors can you generate?

TS 129 272 says the Number-of-Requested-Vectors AVP is an Unsigned32, which gives us a possible pool of 4,294,967,295 combinations. This means it would be legal / valid to send an Authentication Information Request asking for 4.2 billion vectors.

It’s worth noting that that won’t give us the whole pool.

Sequence numbers (SQN) shall have a length of 48 bits.
TS 133 102

While the SQN in the SIM is 48 bits, that gives us a maximum number of values before we “tick over” the odometer of 281,474,976,710,656.

If we were to send 65,536 Authentication-Information-Requests asking for 4,294,967,295 a piece, we’d have got enough vectors to serve the sub for life.

Except the standard allows for an unlimited number of vectors to be requested, this would allow us to “drain the pool” from an HSS to allow every combination of SQN to be captured, to provide a high degree of certainty that the SQN provided to a SIM is far enough ahead of the current SQN that the SIM does not reject the challenges.

Can we do this?

Our lab has access to HSSes from several major vendors of HSS.

Out of the gate, the Oracle HSS does not allow more than 32 vectors to be requested at the same time, so props to them, but the same is not true of the others, all from major HSS vendors (I won’t name them publicly here).

For the other 3 HSSes we tried from big vendors, all eventually timed out when asking for 4.2 billion vectors (don’t know why that would be *shrug*) from these HSSes, it didn’t get rejected.

This is a lab so monitoring isn’t great but I did see a CPU spike on at least one of the HSSes which suggests maybe it was actually trying to generate this.

Of course, we’ve got PyHSS, the greatest open source HSS out there, and how did this handle the request?

Well, being standards compliant, it did what it was asked – I tested with 1024 vectors I’ll admit, on my little laptop it did take a while. But lo, it worked, spewing forth 1024 vectors to use.

So with that working, I tried with 4,294,967,295…

And I waited. And waited.

And after pegging my CPU for a good while, I had to get back to real life work, and killed the request on the HSS.

In part there’s the fact that PyHSS writes back to a database for each time the SQN is incremented, which is costly in terms of resources, but also that generating Milenage vectors in LTE is doing some pretty heavy cryptographic lifting.

The Risk

Dumping a complete set of vectors with every possible SQN would allow an attacker to spoof base stations, and the subscriber would attach without issue.

Historically this has been very difficult to do for LTE, due to the mutual network authentication, however this would be bypassed in this scenario.

The UE would try for a resync if the SQN is too far forward, which mitigates this somewhat.

Cryptographically, I don’t know enough about the Milenage auth to know if a complete set of possible vectors would widen the attack surface to try and learn something about the keys.

Mitigations / Protections

So how can operators protect ourselves against this kind of attack?

Different commercial HSS vendors handle this differently, Oracle limits this to 32 vectors, and that’s what I’ve updated PyHSS to do, but another big HSS vendor (who I won’t publicly shame) accepts the full 4294967295 vectors, and it crashes that thread, or at least times it out after a period.

If you’ve got a decent Diameter Routing Agent in place you can set your DRA to check to see if someone is using this exploit against your network, and to rewrite the number of requested vectors to a lower number, alert you, or drop the request entirely.

Having common OP keys is dumb, and I advocate to all our operator customers to use OP keys that are unique to each SIM, and use the OPc key derived anyway. This means if one SIM spilled it’s keys, the blast doesn’t extend beyond that card.

In the long term, it’d be good to see 3GPP limit the practical size of the Number-of-Requested-Vectors AVP.

2G/3G Impact

Full disclosure – I don’t really work with 2G/3G stacks much these days, and have not tested this.

MAP is generally pretty bandwidth constrained, and to transfer 280 billion vectors might raise some eyebrows, burn out some STPs and take a long time…

But our “Send Authentication Info” message functions much the same as the Authentication Information Request in Diameter, 3GPP TS 29.002 shows we can set the number of vectors we want:

5GC Vulnerability

This only impacts LTE and 5G NSA subscribers.

TS 29.509 outlines the schema for the Nausf reference point, used for requesting vectors, and there is no option to request multiple vectors.

Summary

If you’ve got baddies with access to your HSS / HLR, you’ve got some problems.

But, with enough time, your pool could get drained for one subscriber at a time.

This isn’t going to get the master OP Key or plaintext Ki values, but this could potentially weaken the Milenage security of your system.

Transport Keys & A4 / K4 Keys in EPC & 5GC Networks

12/07/20245G SA, EPC, LTE, Mobile Networks, RFCs & Standards, SDM, SecurityA4, Diameter, HSS, K4, SIM, SIM Card, Transport KeysNick

If you’re working with the larger SIM vendors, there’s a good chance they key material they send you won’t actually contain the raw Ki values for each card – If it fell into the wrong hands you’d be in big trouble.

Instead, what is more likely is that the SIM vendor shares the Ki generated when mixed with a transport key – So what you receive is not the plaintext version of the Ki data, but rather a ciphered version of it.

But as long as you and the SIM vendor have agreed on the ciphering to use, an the secret to protect it with beforehand, you can read the data as needed.

This is a tricky topic to broach, as transport key implementation, is not covered by the 3GPP, instead it’s a quasi-standard, that is commonly used by SIM vendors and HSS vendors alike – the A4 / K4 Transport Encryption Algorithm.

It’s made up of a few components:

K2 is our plaintext key data (Ki or OP)
K4 is the secret key used to cipher the Ki value.
K7 is the algorithm used (Usually AES128 or AES256).

I won’t explain too much about the crypto, but here’s an example from IoT Connectivity’s KiOpcGenerator tool:

def aes_128_cbc_encrypt(key, text):
    """
    implements aes 128b encryption with cbc.
    """
    keyb = binascii.unhexlify(key)
    textb = binascii.unhexlify(text)
    encryptor = AES.new(keyb, AES.MODE_CBC, IV=IV)
    ciphertext = encryptor.encrypt(textb)
    return ciphertext.hex().upper()

It’s important when defining your electrical profile and the reuqired parameters, to make sure the operator, HSS vendor and SIM vendor are all on the same page regarding if transport keys will be used, what the cipher used will be, and the keys for each batch of SIMs.

Here’s an example from a Huawei HSS with SIMs from G&D:

%%LST K4: HLRSN=1;%%
RETCODE = 0 SUCCESS0001:Operation is successful

        "K4SNO" "ALGTYPE"     "K7SNO" "KEYNAME"
        1       AES128        NONE    G+D

We’re using AES128, and any SIMs produced by G&D for this batch will use that transport key (transport key ID 1 in the HSS), so when adding new SIMs we’ll need to specify what transport key to use.

NB-IoT Flows for NIDD

05/07/2024EPC, EUTRAN, LTE, Mobile Networks, RFCs & Standards, SDMDiameter, IoT, LTE, NB-IoTNick

In our last post we covered the basics of NB-IoT Non-IP Data Deliver (NIDD), and if that acronym soup wasn’t enough for you, we’re going to take a deep dive into the flows for attaching, sending, receiving and closing a NIDD session.

The attach for NIDD is very similar to the standard attach for wideband LTE, except the MME establishes a connection on the T6a Diameter interface toward the SCEF, to indicate the sub is online and available.

The NIDD Attach

The SCEF is now able to send/receive NIDD traffic from the subscriber on the T6a interface, but in reality developers don’t / won’t interact with Diameter, so the SCEF exposes the T8 API that developers can interact with to access an abstraction layer to interact with the SCEF, and then through onto the UE.

If you’re wondering what the status of Open Source SCEF implementations are, then you may have already guessed we’re working on one! PyHSS should have support for NB-IoT SCEF features in the future.

NB-IoT provides support for Non-IP Data Delivery (NIDD) over 3GPP Networks, but to handle this, some new network elements are introduced, in a home network scenario that’s the SCEF and the SCF/AS.

On the 3GPP side the SCEF it communicates to the MME via the T6a Interface, which is based upon Diameter.

On the side towards our IoT Service Consumers (in the standards referred to as “SCS/AS” or “Service Capabilities Server Application Servers” (catchy names as always), via the RESTful HTTP based T8 interface.

I’ve written about Non-IP Data support in 5G for transporting Ethernet, but there’s another non-IP use case in 3GPP networks – This time for NB-IoT services.

Procedures

S1 Attach

The start of the S1 Attach procedure is very similar to a regular S1 attach.

The initial S1 PDU Connectivity Request indicates in the ESM Message Container that the PDN Type is Non IP.

S1 PDU Connectivity Request from attach procedure

Other than that, the initial attach procedure looks very similar to the regular S1 attach procedure.

On the S6a interface the Update Location Request from the MME to the HSS indicates that this is an EUTRAN-NB-IoT Radio Access Type.

And the Update Location Answer APN Configuration contains some additional AVPs on the APN to indicate that the APN supports Non-IP-PDN-Type and that the SCEF is used for Data Delivery.

The SCEF-ID (Diameter Host) and SCEF-Realm (Diameter realm) to serve this user is also specified in the APN Configuration in the Update Location Answer.

This is how our MME determines where to send the T6a traffic.

With this, the MME sends a Connection Management Request (CMR) towards the SCEF specified in the SCEF-ID returned by the HSS.

The Connection Management Request / Response

The MME now sends a Diameter T6a Connection Management Request to the SCEF in the Update Location Answer,

In it we have a Session-Id, which continues for the life of our NIDD session, the service-selection which contains our APN (In our case “non-ip”) and the User-Identifier AVP which contains the MSISDN and/or IMSI of the subscriber.

To accept this, the SCEF sends back a Connection-Management-Answer to confirm we’re all good to go:

At this point our SCEF now knows about the subscriber who’s just attached to our network, and correlates it with the APN and the session-ID.

On the S1 side the connection is confirmed and we’re ready to roll.

Mobile Originated Data Request / Response

When the UE wants to send NIDD it’s carried in NAS messaging, so we see an Uplink NAS transport from the UE and inside the NAS payload itself is our HEX data.

Our MME grabs this out and sends it in the form of of a Mobile-Originated-Data-Request (MODR) to the SCEF, along with the same Session-ID that was setup earlier:

At this stage our Non-IP Data is exposed over the T8 RESTful API, which we won’t cover in this post.

eMBMS Architecture in LTE EPC

28/06/2024EPC, LTE, Mobile NetworksBroadcast, eMBMS, MBMS, MulticastNick

Note: I’m lazily posting this as its been in my drafts folder for an exceedingly long time – Before going too much further, it’s worth pointing out that eMBMS never really made it anywhere – no production networks of note use eMBMS. I started researching it and my interest petered out once I discovered I couldn’t get any UEs or hardware that supported eMBMS.

Mobile networks are designed as point to point, all traffic is unicast.

But multicast and broadcast traffic is real, and becoming more common in some applications.

In areas where users stream the same radio program, or TV show, live, each of them is consuming the same data stream, but each one gets sent a unique copy of the data, on a resource block allocated to them for reception of the data.

If we have 10 users on a cell, each streaming a 5Mbps live video, that’s 50Mbps of capacity taken up on the radio / air interface.
If that stream was moved onto a eMBMS service, only 5Mbps of capacity would be used, regardless of how many people on the cell are consuming it.

For Mission Critical Push to Talk applications, the lack of broadcast/multicast support was highlighted again. For a PTT app with 10 users in a talk group, you’d need to schedule resource blocks for 10 users, and allocate 10 radio resources 10 times, send GTP packets 10 times, all to send the same data to 10 people.

So enter eMBMS – The Evolved Multimedia Broadcast and Multicast Service, providing multicast service for LTE.

Overall Architecture

eMBMS introduces a few changes to the RAN side to handle support for a shared data channel, which is sent by the eNodeB and that UEs can listen on to get data. (More on admission control later)

From a core perspective two new network elements are introduced, the Broadcast/Multicast Service Center (BM-SC) and Multimedia Broadcast Multicast Services Gateway (MBMS GW), these elements function in much the same was the P-GW and S-GW retrospectively, but in regards to Multicast services.

Like so many 3GPP specs before it, MBMS relies on GTP for transporting the data to be distributed, and relies on GTPv2-C for control plane data.

BM-SC – Broadcast Media Service Centre

The Broadcast Multicast Service Centre acts as the gateway between content providers (providing streams of data to be distributed) and the EPC.

The BM-SC sets up eMBMS sessions and pulls broadcast data from the content providers and collects receipts from subscribers of some streams to charge / track consumption of the services.

In this regard the BM-SC is akin to the P-GW, which as the border for the EPC and external networks, except it’s largely unidirectional.

MBMS Gateway

The MBMS Gateway (MBMS-GW) encapsulates the broadcast data stream from the BM-SC and encapsulates it into GTP packets to be distributed to eNBs across the network.

The MBMS-GW allocates a multicast transport address for each broadcast data stream?

MME Interaction

For this a new interface is introduced on the MME – the Sm interface, which interconnects the MME and the MBMS-Gateways assigned to it.

Key Interfaces / Reference Points

Sm Interface (MME <-> MBMS GW)

MBMS Session Start Request/Response
MBMS Session Update Request/Response
MBMS Session Stop Request/Response

SGmb Interface (MBMS GW <-> BM-SC)

Control plane signaling

SGimb Interface (MBMS GW <-> BM-SC)

User Plane Signalling (Media)

DNS’ role in S8-Home Routing Roaming

09/04/2024EPC, IMS / VoLTE, LTE, Mobile Networks, Notes, RFCs & StandardsDNS, EPC, LTE, Roaming, S8, S8-HRNick

S8 Home Routing is a really simple concept, the traffic goes from the SGW in the visited PLMN to the PGW in the home PLMN, so the PCRF, OCS/OFCS, IMS, IP Addresses, etc, etc, are all in the home network, and this avoids huge amounts of complexity.

But in order for this to work, the visited network MME needs to find the PGW of the home network, and with over 700 roaming networks in commercial use, each one with potentially hundreds of unique APNs each routing to a different PGW, this is a tricky proposition.

If you’ve configured your PGW peers statically on your MME, that’s fine, but it doesn’t scale very well – And if you add an MVNO who wants their own PGW for serving their APN, well you’ll be adding some complexity there to, so what to do?

Well, the answer is DNS.

By taking the APN to be served, the home PLMN and the interface type desired, with some funky DNS queries, our MME can determine which PGW should be selected for a request.

Let’s take a look, for a UE from MNC XXX MCC YYY roaming into our network, trying to access the “IMS” APN.

Our MME knows the network code of the roaming subscriber from the IMSI is MNC XXX, MCC YYY, and that the UE is requesting the IMS APN.

So our MME crafts a DNS request for the NAPTR query for ims.apn.epc.mncXXX.mccYYY.3gppnetwork.org:

Because the domain is epc.mncXXX.mccYYY.3gppnetwork.org it’s routed to the authoritative DNS server in the home network, which sends back the response:

We’ve got a few peers to pick from, so we need to filter this list of Answers to only those that are relevant to us.

First we filter by the Service tag, whihc for each listed peer shows what services that peer supports.

But since we’re looking for S8, we need to find a peer who’s “Service” tag string contains:

x-3gpp-pgw:x-s8-gtp

We’re looking for two bits of info here, the presence of x-3gpp-pgw in the Service to indicate that this peer is a PGW and x-s8-gtp to indicate that this peer supports the S8 interface.

A service string like this:

x-3gpp-pgw:x-s5-gtp

Would be excluded as it only supports S5 not S8 (Even though they are largely the same interface, S8 is used in roaming).

It’s also not uncommon to see both services indicated as supported, in which case that peer could be selected too:

x-3gpp-pgw:x-s5-gtp:x-s8-gtp

(The answers in the screenshot include :x-gp which means the PGWs advertised are also co-located with a GGSN)

So with our answers whittled down to only those that meet our needs, we next use the Order and the Preference to pick our best candidate, this is the same as regular DNS selection logic.

From our candidate, we’ve also got the Regex Replacement, which allows our original DNS request to be re-written, which allows us to point at a single peer.

In our answer, we see the original request ims.apn.epc.mncXXX.mccYYY.3gppnetwork.org is to be re-written to topon.lb1.pgw01.epc.mncXXX.mccYYY.3gppnetwork.org.

This is the FQDN of the PGW we should use.

Now we know the FQND we should use, we just do an A-Record lookup (Or AAAA record lookup if it is IPv6) for that peer we are targeting, to turn that FQDN into an IP address we can use.

And then in comes the response:

So now our MME knows the IP of the PGW, it can craft a Create Session request where the F-TEID for the S8 interface has the PGW IP set on it that we selected.

For more info on this TS 129.303 (Domain Name System Procedures) is the definitive doc, but the GSMA’s IR.88 “LTE and EPC Roaming Guidelines” provides a handy reference.

The meaning of 3GPP-Charging-Characteristics

26/03/2024EPC, GSM, LTE, Mobile Networks, SDMAVP, Charging, Diameter, EPCNick

How does one encode / interpret the value of this AVP / IE was the question I set out to answer.

TS 29.274 says:

For the encoding of this information element see 3GPP TS 32.298

TS 32.298 says:

The functional requirements for the Charging Characteristics as well as the profile and behaviour bits are further defined in normative Annex A of TS 32.251

TS 32.251 Annex A says:

The Charging Characteristics parameter consists of a string of 16 bits designated as Behaviours (B), freely defined by Operators, as shown in TS 32.298 [51]. Each bit corresponds to a specific charging behaviour which is defined on a per operator basis, configured within the PCN and pointed when bit is set to “1” value.

After a few circular references I found this is imported from 32.298.

Finally we find some solid answers hidden away in TS 132 215, under the Charging Characteristics Profile index.

Charging Characteristics consists of a string of 16 bits designated as Profile (P) and Behaviour (B), shown in Figure 4.
The first four bits (P) shall be used to select different charging trigger profiles, where each profile consists of the
following trigger sets:

S-CDR: activate/deactivate CDRs, time limit, volume limit, maximum number of charging conditions, tariff
times;

G-CDR: same as SGSN, plus maximum number of SGSN changes;

M-CDR: activate/deactivate CDRs, time limit, and maximum number of mobility changes;

SMS-MO-CDR: activate/deactivate CDRs;

SMS-MT-CDR: active/deactivate CDRs.

The Charging Characteristics field allows the operator to apply different kind of charging methods in the CDRs.
A subscriber may have Charging Characteristics assigned to his subscription. These characteristics can be supplied by the HLR to the SGSN as part of the subscription information, and, upon activation of a PDP context, the SGSN forwards the charging characteristics to the GGSN on the Gn / Gp reference point according to the rules specified in Annex A of TS 32.251 [11].

This information can be used by the GSNs to activate CDR generation and control the
closure of the CDR or the traffic volume containers (see clause 5.1.2.2.23) and is included in CDRs transmitted to nodes handling the CDRs via the Ga reference point. It can also be used in nodes handling the CDRs (e.g., the CGF or the billing system) to influence the CDR processing priority and routing.

These functions are accomplished by specifying the charging characteristics as sets of charging profiles and the expected behaviour associated with each profile.

The interpretations of the profiles and their associated behaviours can be different for each PLMN operator and are not subject to standardisation. In the present document only the charging characteristic formats and selection modes are specified.

The functional requirements for the Charging Characteristics as well as the profile and behaviour bits are further defined in normative Annex A of TS 32.251 [11], including the definitions of the trigger profiles associated with each CDR type.

The format of charging characteristics field is depicted in Figure 4. Px (x =0..3) refers to the Charging Characteristics Profile index. Bits classified with a “B” may be used by the operator for non-standardised behaviour (see Annex A of TS 32.251 [11]).

Right, well hopefully next time someone goes looking for this info you’ll find it a bit more easily than I did!

SMS over Diameter for Roaming SMS

12/03/2024EPC, GSM, LTE, Mobile Networks, RFCs & StandardsDiameter, LTE, NB-IoT, NTN, SMSNick

I know what you’re thinking, again with the SMS transport talk Nick? Ha! As if we’re done talking about SMS. Recently we did something kinda cool – The world’s first SMS sent over NB-IoT (Satellite).

But to do this, we weren’t using IMS, it’s too heavy (I’ve written about NB-IoT’s NIDD functions and the past).

SGs-AP which is used for CSFB & SMS doesn’t span network borders (you can’t roam with SGs-AP), and with SMSoIP out of the question, that gave us the option of MAP or Diameter, so we picked Diameter.

This introduces the S6c and SGd Diameter interfaces, in the diagrams below Orange is the Home Network (HPMN) and the Green is the Visited Network (VPMN).

The S6c interface is used between the SMSc and the HSS, in order to retrieve the routing information. This like the SRI-for-SM in MAP.

The SGd interface is used between the MME serving the UE and the SMSc, and is used for actual delivery of the MO/MT messages.

I haven’t shown the Diameter Routing Agents in these diagrams, but in reality there would be a DRA on the VPLMN and a DRA on the HPMN, and probably a DRA in the IPX between them too.

The Attach

The attach looks like a regular roaming attach, the MME in the Visited PMN sends an Update Location Request to the HSS, so the HSS knows the MME that is serving the subscriber.

S6a Update Location Request to indicate the MME serving the Subscriber

The Mobile Terminated SMS Flow

Now we introduce the S6c interface and the SGd interfaces.

When the Home SMSc has a message to send to the subscriber (Mobile Terminated SMS) it runs a the Send-Routing-Info-for-SM-Request (SRR) dialog to the HSS.

The Send-Routing-Info-for-SM-Answer (SRA) back from the HSS contains the info on the MME Diameter Host name and Diameter Realm serving the subscriber.

S6t – Send-Routing-Info-for-SM request to get the MME serving the subscriber

With this info, we can now craft a Diameter Request that will get sent to the MME serving the subscriber, containing the SMS PDU to send to the UE.

SGd MT-Forward-Short-Message to deliver Mobile Terminated SMS to the serving MME

We make sure it’s sent to the correct MME by setting the Destination-Host and Destination-Realm in the Diameter request.

Here’s how the request looks from the SMSc towards our DRA:

As you can see the Destination Realm and Destination-Host is set, as is the User-Name set to the IMSI of the UE we want to send the message to.

And down the bottom you can see the SMS-TPDU, the same as it’s been all the way back since GSM days.

The Mobile Originated SMS Flow

The Mobile Originated flow is even simpler, because we don’t need to look up where to route it to.

The MME receives the MO SMS from the UE, and shoves it into a Diameter message with Application ID set to SGd and Destination-Realm set to the HPMN Realm.

When the message reaches the DRA in the HPMN it forwards the request to an SMSc and then the Home SMSc has the message ready to roll.

So that’s it, pretty straightforward to set up!

Uncomfortable Questions to ask about 5G Standalone at MWC – Part 2 – Has this Cash cow got Milk?

21/02/20245G SA, EPC, Mobile Networks, RFCs & Standards, Software5G, 5G Core, 5G-SA, 5GC, Mobile Networks, StandaloneNick

This is the second post of 3 presenting the argument against introducing 5G-SA.

There’s an old adage that businesses spend money for one of three reasons:

To Save Money (Which I covered yesterday)
To make more Money (This post, congratulations, you’re reading it!)
Because they have to (Regulatory compliance, insurance, taxes, etc) – That’s the next post

So let’s look at SA in this context.

5G-SA can drive new revenue streams

We (as an industry) suck at this.

Last year on the Telecoms.com podcast, Scott Bicheno made the point that if operators took all the money they’d gambled (and lost) on trying to play in the sports rights, involvement in media companies, building their own streaming apps, attempts at bundling other utilities, digital identity, etc, and just left the cash in the bank and just operated the network, they’d be better off.

Uber, Spotify, “OTTs”, etc, utilize MNOs to enable their services, but operators don’t see this extra revenue.
While some operators may talk of “fair share” the truth is, these companies add value to our product (connectivity) which as an industry, we’ve failed to add ourselves.

Last year at MWC we saw vendors were still beating the drum about 5G being critical for the “Metaverse”, just weeks before Meta announced they were moving away from the Metaverse.

Today the only device getting any attention from consumers is Apple’s Vision Pro, a very pricey, currently niche offering, which has no SIM card or cellular connectivity.

If the Metaverse does turn out to be a cash cow, it is unlikely the telecommunications industry will be the ones milking it.

Claim: Customers are willing to pay more for 5G-SA

This myth seems to be fairly persistent, but with minimal data to support this claim.

While BSS vendors talk about “5G Monetization”, the truth is, people use their MNO to provide them connectivity. If the coverage is adequate, and the speed enough to do what they need to do, few would be willing to pay any additional cash each month to see higher numbers on a speedtest result (enabled by 5G-NSA) and even fewer would pay extra cash for, well, whatever those features only enabled by 5G-Standalone are?

With most consumers now also holding onto their mobile devices for longer periods of time, and with interest rates reining in consumer spending across the board, we are seeing the rise of a more cost conscious consumer than ever before. If we want to see higher ARPUs, we need to give the consumer a compelling reason to care and spend their cash, beyond a speed test result.

We talk a little about APIs lower down in the post.

Claim: Users want Ultra-Low Latency / High Reliability Comms that only 5G-SA delivers

Wanting to offer a product to the market, is not the same as the market wanting a product to consume.

Telecom operators want customers to want these services, but customer take up rates tell a different story. For a product like this to be viable, it must have a wide enough addressable market to justify the investment.

Reliability

The URLCC standards focus on preventing packet loss, but the world has moved on from needing zero packet loss.

The telecom industry has a habit of deciding what customers want without actually listening.
When a customer talks about wanting “reliable” comms, they aren’t saying they want zero packet loss, but rather fewer dropouts or service flaps.
For us to give the customer what they are actually asking for involves us expanding RAN footprint and adding transmission diversity, not 5G-SA.

The “protocols of the internet” (TCP/IP) have been around for more than 50 years now.

These protocols have always flowed over transport links with varied reliability and levels of packet loss.

Thanks to these error correction and retransmission techniques built into these protocols, a lost packet will not interrupt the stream. If your nuclear command and control network were carried over TCP/IP over the public internet (please don’t do this), a missing packet won’t lead to worldwide annihilation, but rather the sender will see the receiver never acknowledged the receipt of the packet at the other end, and resend it, end of.

If you walk into a hospital today, you’ll find patient monitoring devices, tracking the vital signs for patients and alerting hospital staff if a patient’s vital signs change. It is hard to think of more important services for reliability than this.

And yet they use WiFi, and have done for a long time, if a packet is lost on WiFi (as happens regularly) it’s just retransmitted and the end user never knows.

Autonomous cars are unlikely to ever rely on a 5G connection to operate, for the simple reason that coverage will never be 100%. If your car stops because you’re in a not-spot, you won’t be a happy customer. While plenty of cars have cellular modems in them, that are used to upload telemetry data back to the manufacturer, but not to drive the car.

One example of wireless controlled vehicles in the wild is autonomous haul trucks in mines. Historically, these have used WiFi for their comms. Mine sites are often a good fit for Private LTE, but there’s nothing inherent in the 5G Standalone standard that means it’s the only tool for the job here.

Slicing

Slicing is available in LTE (4G), with an architecture designed to allow access to others. It failed to gain traction, but is in networks today.

See: Pre-5G Network Slicing.

What is different this time?

Low Latency

The RAN a piece of the latency puzzle here, but it is just one piece of the puzzle.

If we look at the flow a packet takes from the user’s device to the server they want to talk to we’ve got:

Time it takes the UE to craft the packet
Time it takes for the packet to be transmitted over the air to the base station
Time it takes for the packet to get through the RAN transmission network to the core
Time it takes the packet to traverse the packet core
Time it takes for the packet to get out to transit/peering
Time it takes to get the packet from the edge of the operators network to the edge of the network hosting the server
Time it takes the packet through the network the server is on
Time it takes the server to process the request

The “low latency” bit of the 5G puzzle only involves the two elements in bold.

If you’ve got to get from point A to point B along a series of roads, and the speed limit on two of the roads you traverse (short sections already) is increased. The overall travel time is not drastically reduced.

I’m lucky, I have access to a well kitted out lab which allows me to put all of these latency figures to the test and provide side by side metrics. If this is of interest to anyone, let me know. Otherwise in the meantime you’ll just have to accept some conjecture and opinion.

You could rebut this talking about Edge Compute, and having the datacenter at the base of the tower, but for a number of fairly well documented reasons, I think this is unlikely to attract widespread deployment in established carrier networks, and Intel’s recent yearly earning specifically called this out.

Claim: Customers want APIs and these needs 5G SA

Companies like Twilio have made it easy to interact with the carrier network via their APIs, but yet again, it’s these companies producing the additional value on a service operated by the MNOs.

My coffee machine does not have an API, and I’m OK with this because I don’t have a want or need to interact with it programatically.

By far, the most common APIs used by businesses involving telco markets are APIs to enable sending an SMS to a user.

These have been around for a long time, and the A2P market is pretty well established, and the good news is, operators already get a chunk of this pie, by charging for the SMS.

Imagine a company that makes medical booking software. They’re a tech company, so they want their stack to work anywhere in the world, and they want to be able to send reminder SMS to end users.

They could get an account manager with each of the telcos in each of the markets they work in, onboard and integrate the arcane complexities of each operators wholesale SMS system, or they could use Twilio or a similar service, which gives them global reach.

Often the cost of services like Twilio are cheaper than working directly with the carriers in each market, and even if it is marginally more expensive, the cost savings by not having to deal with dozens of carriers or integrate into dozens of systems, far outweighs this.

GSMA’s OpenGateway Initiative has sought to rectify this, but it lacks support for the use case we just discussed.

While it’s a great idea, in the context of 5G Standalone and APIs, it’s worth noting that none of the use cases in OpenGateway require 5G Standalone (Except possibly Edge discovery, but it is debatable).

Even Slicing existed before in LTE.

Critically, from a developer experience perspective:

I can sign up to services like Twilio without a credit card, and start using the service right away, with examples in my programming language of choice, the developer user experience is fantastic.

Jump on the OpenGateway website today and see if you can even find a way to sign up to use the service?

Claim: Fixed Wireless works best with 5G-SA

Of all the touted use cases and applications for 5G, Fixed Wireless (FWA) has been the most successful.

The great thing about FWA on Cellular networks is you can use the same infrastructure you use for your mobile customers, and then sell excess capacity in the network to deliver Fixed Wireless Access services, better utilizing an asset (great!).

But again, this does not require Standalone 5G. If you deploy your FWA network using 5G SA, then you won’t be able to sweat that same asset for both mobile subscribers and FWA subscribers.

Today at least, very few handsets short of this generation of flagship phones, supports 5G SA. Even the phones sold as supporting 5G over the past few years, are almost all only supporting 5G-NSA, so if you rolled out your FWA network as Standalone, you can’t better utilize the asset by sharing with your existing LTE/5G-NSA customers.

Claim: The Killer App is coming for 5G and it needs 5G SA

This space is reserved for the killer app that requires 5G Standalone.

Whenever that comes?

Anyone?

I’m not paying to build a marina berth for my mega yacht, mostly because I don’t have one. Ditto this.

Could you explain to everyone on an investor call that you’re investing in something where the vessel of the payoff isn’t even known to exist? Telecom is “blue chip”, hardly speculative.

The Future for Revenue Growth?

Maybe there isn’t one.

I know it’s an unthinkable thought for a lot of operators, but let’s look at it rationally; in the developed world, everyone who wants a mobile service already has one.

This leaves operators with two options; gaining market share from their competitors and selling more/higher priced services to existing customers.

You don’t steal away customers from other operators by offering a higher priced product, and with reduced consumer spending people aren’t queuing up to spend more each month.

But there is a silver lining, if you can’t grow revenues, you can still shrink expenditure, which in the end still gets the same result at the end of the quarter – More cash.

Simplify your operations, focus on what you do really well (mobile services), the whole 80/20 rule, get better at self service, all that guff.

There’s no shortage of pain points for consumers telecom operators could address, to make the customer experience better, but few that include the word Slicing.

Uncomfortable Questions to ask about 5G Standalone at MWC – Part 1 – Does $tandalone save $$$?

20/02/20245G SA, EPC, EUTRAN, Mobile Networks, RFCs & Standards5G, 5G-SA, 5GC, EPC, StandaloneNick

No one spends marketing dollars talking about the problems with a tech and vendors aren’t out there promoting sweating existing assets. But understanding your options as an operator is more important now than ever before.

Sidebar; This post got really long, so I’m splitting it into 3…

We’re often asked to help define a a 5G strategy for operators; while every case is different, there’s a lot of vendors pushing MNOs to move towards 5G standalone or 5G-SA.

I’m always a fan of playing “devil’s advocate“, and with so many articles and press releases singing the praises of standalone 5G/5G-SA, so as a counter in this post, I’ll be making the case against the narratives presented to operators by vendors that the “right” way to do 5G is to introduce 5G Standalone, that they should all be “upgrading” to Standalone 5G.

With Mobile World Congress around the corner, now seems like a good time to put forward the argument against introducing 5G Standalone, rebutting some common claims about 5G Standalone operators will be told. We’ll counterpoint these arguments and I’ll put forward the case for not jumping onto the 5G-SA bandwagon – just yet.

On a personal note, I do like 5G SA, it has some real advantages and some cool features, which are well documented, including on this blog. I’m not looking to beat up on any vendors, marketing hype or events, but just to provide the “other side” of the equation that operators should consider when making decisions and may not be aware of otherwise. It’s also all opinion of course (cited where possible), but if you’re going to build your network based on a blog post (even one as good as this) you should probably reconsider your life choices.

Some Arcane Detail: 5G Non-Standalone (NSA) vs Standalone (SA)

5G NSA (Non Standalone) uses LTE (4G) with an additional layer “bolted on” that uses 5G on the radio interface to provide “5G” speeds to users, while reusing the existing LTE (Evolved Packet Core) core and VoLTE for voice / SMS.

From an operator perspective there is almost no change required in the network to support NSA 5G, other than in the RAN, and almost all the 5G networks in commercial use today use 5G NSA.

5G NSA is great, it gives the user 5G speeds for users with phones that support it, with no change to the rest of the network needed.

Standalone 5G on the other hand requires an a completely new core network with all the trimmings.

While it is possible to handover / interwork with LTE/4G (Inter-RAT Handovers), this is like 3G/4G interworking, where each has a different core network. Introducing 5G standalone touches every element of the network, you need new nodes supporting the new standards for charging, policy, user plane, IMS, etc.

Scope

There’s an old adage that businesses spend money for one of three reasons:

To Save Money (Which we’ll cover in this post)
To make more Money (Covered next – Will link when published)
Because they have to (Regulatory compliance, insurance, taxes, etc)

Let’s look at 5G Standalone in each of these contexts:

5G Cost Savings – Counterpoint: The cost-benefit doesn’t stack up

As an operator with an existing deployed 4G LTE network, deploying a new 5G standalone network will not save you money.

From an capital perspective this is pretty obvious, you’re going to need to invest in a new RAN and a new core to support this, but what about from an opex perspective?

Claim: 5G RAN is more efficient than 4G (LTE) RAN

Spectrum is both finite and expensive, so MNOs must find the most efficient way to use that spectrum, to squeeze the most possible value out of it.

Let’s look at some numbers:

In the case of 3G vs 4G (LTE) there was a strong cost saving case to be made; a single 5Mhz UMTS (3G) cell could carry a total of 14Mbps, while if that same 5Mhz channel was refarmed / shifted to a 4×4 LTE (4G) carrier we hit 75Mbps of downlink data.

In rough numbers, we can say we get 5x the spectral efficiency by moving from 3G to 4G. This means we can carry 5.2x more with the same spectrum on 4G than we can on 3G – A very compelling reason to upgrade.

The like-for-like spectral efficiency of 5G is not significantly greater than that of LTE.

In numbers the same 5Mhz of spectrum we refarmed from UMTS (3G) to 4G (LTE) provided a 5x gain in efficiency to deliver 75Mbps on LTE. The same configuration refarmed to 5G-NR would provide 80Mbps.

Refarming spectrum from 4G (LTE) to 5G (NR) only provides a 6% increase in spectral efficiency.

While 6% is not nothing, if refarmed to a 5G standalone network, the spectrum can no longer be used by LTE only devices (Unless Dynamic Spectrum Sharing is used which in itself leads to efficiency losses), which in itself reduces the efficiency and would add additional load to other layers.

The crazy speeds demonstrated by 5G are not due to meaningful increases in efficiency, but rather the ability to use more spectrum, spectrum that operators need to purchase at auction, purchase equipment to utilize and pay to run.

Claim: 5G Standalone Core is Cheaper to operate as it is “Cloud Native”

It has been widely claimed that the shift for the 5G Core Architecture to being “Cloud Native” can provide cost savings.

Operators should regard this in a skeptical manner; after all, we’ve been here before.

Did moving from big-iron to VNFs provide the promised cost savings to operators?

For many operators the shift from hardware to software added additional complexity to the network and increased the headcount to support this.

What were once big-iron appliances dedicated to one job, that sat in the corner and chugged away, are now virtual machines (VNFs).
Many operators have naturally found themselves needing a larger team to manage the virtual environment, compared to the size of the team they needed to just to plug power and data into a big box in an exchange before everything was virtualized.

Introducing a “Cloud Native” Kubernetes layer on top of the VNF / virtualization layer, on top of the compute layer, leaves us with a whole lot of layers. All of which require resources to be maintain, troubleshoot and kept running; each layer having associated costs for staffing, licensing and support.

Many mid size enterprises rushed into “the cloud” for the promised cost savings only to sheepishly admit it cost more than the expected.

Almost none of the operators are talking about running these workloads in the public cloud, but rather “Private Clouds” built on-premises, using “Cloud Native” best practices.

One of the central arguments about cloud revolves around “elastic scaling” where the network can automatically scale to match demand; think extra instances spun up a times of peak demand and shut down when the demand drops.

I explain elastic scaling to clients as having to move people from one place to another. Most of the time, I’m just moving myself, a push bike is fine, or I’ve got a 4 seater car, but occasionally I’ll need to move 25 people and for that I’d need a bus.

If I provide the transportation myself, I need to own a bike, a car and a bus.

But if use the cloud I can start with the push bike, and as I need to move more people, the “cloud” will provide me the vehicle I need to move the people I need to move at that moment, and I’ll just pay for the time I need the bus, and when I’m done needing the bus, I drop back to the (cheaper) push bike when I’m not moving lots of people.

This is a really compelling argument, and telecom operators regularly announces partnerships with the hyperscalers, except they’re always for non-core-network workloads.

While telecom operators are going to provide the servers to run this in “On-prem-cloud”, they need to dimension for the maximum possible load. This means they need to own a bike/car/bus, even if they’re not using it most of the time, and there’s really no cost savings to having a bus but not using it when you’re not paying by the hour to hire it.

Infrastructure aside, introducing a Standalone 5G Core adds another core network to maintain. Alongside the Circuit Switched Core (MSC/GGSN/SGSN) serving 2G/3G subscribers, Evolved Packet Core serving 4G (LTE) and 5G-NSA subscribers, adding a 5G Standalone Core to for the 5G-SA subscribers served by the 5G SA cells, is going to be more work (and therefore cost).

While the majority of operators have yet to turn off their 2G/3G core networks, introducing another core network to run in parallel is unlikely to lead to any cost savings.

Claim: Upgrading now can save money in the Future / Future Proofing

Life cycles of telecommunications are two fold, one is the equipment/platform life cycle (like the RAN components or Core network software being used to deliver the service) the other is the technology life cycle (the generation of technology being used).

The technology lifecycles in telecommunications are vastly longer than that for regular tech.

GSM (2G) was introduced into the UK in 1991, and will be phased out starting in 2033, a 42 year long technology life cycle.

No vendor today could reasonably expect the 5G hardware you deploy in 2024 to still be in production in 2066 – The platform/equipment life cycle is a lot shorter than the technology life cycle.

Operators will to continue relying on LTE (4G) well into the late 2030s.

I’d wager that there is not a single piece of equipment in the Vodafone UK GSM network today, that was there in 1991.
I’d go even further to say that any piece of equipment in the network today, didn’t even replace the 1991 equipment, but was probably 3 or 4 generations removed from the network built in 1991.

For most operators, RAN replacements happen between 4 to 7 years, often with targeted augmentation / expansion as needed in the form of adding extra layers / sectors between these times.

The question operators should be asking is therefore not what will I need to get me through to 2066, but rather what will I need to get to 2030?

The majority of operators outside the US today still operate a 2G or 3G network, generally with minimal bandwidth to support legacy handsets and devices, while the 4G (LTE) network does most of the heavy lifting for carrying user traffic. This is often with the aid of an additional 5G-NSA (Non-Standalone) layer to provide additional capacity.

Is there a cost saving angle to adding support for 5G-Standalone in addition to 2G/3G/4G (LTE) and 5G (Non-Standalone) into your RAN?

A logical stance would be that removing layers / technologies (such as 2G/3G sunsetting) would lead to cost savings, and adding a 5G Standalone layer would increase cost.

All of the RAN solutions on the market today from the major vendors include support for both Standalone 5G and Non Standalone, but the feature licensing for a non-standalone 5G is generally cheaper than that for Standalone 5G.

The question operators should be asking is on what timescale do I need Standalone 5G?

If you’ve rolled out 5G-NSA today, then when are you looking to sunset your LTE network?
If the answer is “I hope to have long since retired by that time”, then you’ve just answered that question and you don’t need to licence / deploy 5G-SA in this hardware refresh cycle.

Other Cost Factors

Roaming: The majority of roaming traffic today relies on 2G/3G for voice. VoLTE roaming is (finally) starting to establish a foothold, but we are a long way from ubiquitous global roaming for LTE and VoLTE, and even further away for 5G-SA roaming. Focusing on 5G roaming will enable your network for roaming use by a miniscule number of operators, compared to LTE/VoLTE roaming which covers the majority of the operators in the developed world who can utilize your service.

I decided to split this into 3 posts, next I’ll post the “5G can make us more money” post and finally a “5G because we have to” post. I’ll post that on LinkedIn / Twitter / Mailing list, so stick around, and feel free to trash me in the comments.