Category Archives: RFCs & Standards

Looking inside the MMS Exchange (With call flow and PCAP)

So you want to send an MMS?

We’ve covered SMS in the past, but MMS is a different kettle of fish.

Let’s look at how the call flow goes, when Bob wants to send a picture to Alice.

Before Bob sends the MMS, his phone will have to be setup with the correct settings to send MMS.
Sometimes this is done manually, for others it’s done through the Carrier provisioning SMS that preloads the settings, and for others it’s baked in based on the Android Carrier settings XML,

APN settings for Telstra in Australia for MMS

It’s made up of the APN to send MMS traffic over, the MMSC address (Multimedia Message Switching Center) and often an MMS proxy and port combination for where the traffic will actually go.

Message Flow – Bob to MMSC (Mobile Originated MMS)

Bob opens his phone, creates a new message to Alice, selects the picture (or other multimedia filetype) to send to her and hits the send button.

For starters, MMS has a file size limit, like MTU it’s not advertised, so you don’t know if you’ve hit it, so rather like MTU is a “lowest has the highest success of getting through” rule. So Bob’s phone will most likely scale the image down to fit inside 300K.

Next Bob’s phone knows it has an MMS to send, for this is opens up a new bearer on the MMS APN, typically called MMS, but configured in the phone by Bob.

Why use a separate APN for sending 300K of MMS traffic?
Once upon a time mobile data was expensive.
By having a separate APN just for MMS traffic (An APN that could do nothing except send / receive MMS) allowed easier billing / tariffing of data, as MMS traffic was sent over a APN which was unmetered.

After the bearer is setup on the MMS APN, Bob’s phone begins crafting a HTTP 1.1 Post to be sent to the MMSC.
The content type of this request will be application/vnd.wap.mms-message and the body of the HTTP post will be made up of MMS Message Encapsulation, with the body containing the picture he wants to send to Alice.

Note: Historically Wireless Session Protocol (WSP) was used in lieu of HTTP. These clients would now need a WAP gateway to translate into HTTP.

This HTTP Post is then sent to the MMSC Address, or, if present, the MMSC Proxy address.
This traffic is sent over the MMS APN that we just brought up.

HTTP POST Headers for the MO MMS Message

MMS Message Encapsulation from MO MMS Message

The MMSC receives this information, and then, if all was successful, responds with a 200 OK,

200 OK response to MO MMS Message

So now the MMSC has the information from Bob, let’s flip over to Alice.

Message Flow – MMSC to Alice (Mobile Terminated MMS)

For the purposes of simplicity, we’re going to rule out the MMSC from doing clever things like converting the media, accepting email (SMPP) as MMS, etc, etc. Instead we’re going to assume Alice and Bob are on the same Network, and our MMSC is just doing store-and-forward.

The MMSC will look at the To address in the MMS Message Encapsulation of the request Bob sent, to determine that this message is destined for Alice.

The MMSC will load the media content (photo) sent by Bob destined for Alice and serve it via HTTP. The MMSC generates a random URL to serve it this particular file on, with each MMS the MMSC handles being assigned a random URL containing the media content.

Next the MMSC will need to tell Alice’s phone, that she has an MMS waiting for her. This is done by generating an SMS to send to Alice’s phone,

The user-data of this SMS is the Wireless Session Protocol with the method PUSH – Aka WAP Push.

SMS alerting the user of an MMS waiting for delivery

This specially encoded SMS is parsed by the Alice’s phone, which tells the her there is an MMS message waiting for her.

On some operating systems this is pulled automatically, on others, users need to select “Download” to actually get the file.

The UE then just runs an HTTP get to the address in the X-Mms-Content-Location: Header to pull the multimedia content that Bob sent.

HTTP GET from Alice’s Phone / UE to retrieve MMS sent by Bob (MT-MMS)

All going well the URL is valid and Alice’s phone retrieves the message, getting a 200 OK back from the server with the message content.

HTTP Response (200 OK) for MT-MMS, sent by the MMSC to Alice’s phone with the MMS Body

So now Alice’s phone has the MMS content and renders it on the screen, Alice can see the Photo Bob sent her.

Lastly Alice’s phone sends a HTTP POST again to the MMSC, this time indicating the message status is “Retrieved”,

And to close everything off the MMSC confirms receipt of the Retrieved status with a 200 OK, and we are done.

What didn’t we cover?

So that’s a basic MMS message flow, but there’s a few parts we didn’t cover.

The overall architecture beyond just the store-and forward behaviour, charging and authentication we didn’t cover. So let’s look at each of these points.

Overall Architecture

What we just covered what what’s defined as the MM1 interface.

There’s obviously a stack of other interfaces, such as for charging, messaging between MMSC/Carriers, subscriber locating / user database, etc.

Charging

MMSCs would typically have a connection to trigger charging events / credit-control events prior to processing the message.

For online charging the Ro interface can be used, as you would for IMS charging events.

3GPP 3GPP TS 32.270 covers the charging architecture for online/offline charging for MMS.

Authentication

Unfortunately authentication was a bit of an afterthought for the MMS standard, and can be done several different ways.

The most common is to correlate the IP Address on the MMS APN against a subscriber.

Telephony binary-coded decimal (TBCD) in Python with Examples

Chances are if you’re reading this, you’re trying to work out what Telephony Binary-Coded Decimal encoding is. I got you.

Again I found myself staring at encoding trying to guess how it worked, reading references that looped into other references, in this case I was encoding MSISDN AVPs in Diameter.

How to Encode a number using Telephony Binary-Coded Decimal encoding?

First, Group all the numbers into pairs, and reverse each pair.

So a phone number of 123456, becomes:

214365

Because 1 & 2 are swapped to become 21, 3 & 4 are swapped to become 34, 5 & 6 become 65, that’s how we get that result.

TBCD Encoding of numbers with an Odd Length?

If we’ve got an odd-number of digits, we add an F on the end and still flip the digits,

For example 789, we add the F to the end to pad it to an even length, and then flip each pair of digits, so it becomes:

87F9

That’s the abbreviated version of it. If you’re only encoding numbers that’s all you’ll need to know.

Detail Overload

Because the numbers 0-9 can be encoded using only 4 bits, the need for a whole 8 bit byte to store this information is considered excessive.

For example 1 represented as a binary 8-bit byte would be 00000001, while 9 would be 00001001, so even with our largest number, the first 4 bits would always going to be 0000 – we’d only use half the available space.

So TBCD encoding stores two numbers in each Byte (1 number in the first 4 bits, one number in the second 4 bits).

To go back to our previous example, 1 represented as a binary 4-bit word would be 0001, while 9 would be 1001. These are then swapped and concatenated, so the number 19 becomes 1001 0001 which is hex 0x91.

Let’s do another example, 82, so 8 represented as a 4-bit word is 1000 and 2 as a 4-bit word is 0010. We then swap the order and concatenate to get 00101000 which is hex 0x28 from our inputted 82.

Final example will be a 3 digit number, 123. As we saw earlier we’ll add an F to the end for padding, and then encode as we would any other number,

F is encoded as 1111.

1 becomes 0001, 2 becomes 0010, 3 becomes 0011 and F becomes 1111. Reverse each pair and concatenate 00100001 11110011 or hex 0x21 0xF3.

Special Symbols (#, * and friends)

Because TBCD Encoding was designed for use in Telephony networks, the # and * symbols are also present, as they are on a telephone keypad.

Astute readers may have noticed that so far we’ve covered 0-9 and F, which still doesn’t use all the available space in the 4 bit area.

The extended DTMF keys of A, B & C are also valid in TBCD (The D key was sacrificed to get the F in).

Symbol4 Bit Word
*1 0 1 0
#1 0 1 1
a1 1 0 0
b1 1 0 1
c1 1 1 0

So let’s run through some more examples,

*21 is an odd length, so we’ll slap an F on the end (*21F), and then encoded each pair of values into bytes, so * becomes 1010, 2 becomes 0010. Swap them and concatenate for our first byte of 00101010 (Hex 0x2A). F our second byte 1F, 1 becomes 0001 and F becomes 1111. Swap and concatenate to get 11110001 (Hex 0xF1). So *21 becomes 0x2A 0xF1.

And as promised, some Python code from PyHSS that does it for you:

    def TBCD_special_chars(self, input):
        if input == "*":
            return "1010"
        elif input == "#":
            return "1011"
        elif input == "a":
            return "1100"
        elif input == "b":
            return "1101"
        elif input == "c":
            return "1100"      
        else:
            print("input " + str(input) + " is not a special char, converting to bin ")
            return ("{:04b}".format(int(input)))


    def TBCD_encode(self, input):
        print("TBCD_encode input value is " + str(input))
        offset = 0
        output = ''
        matches = ['*', '#', 'a', 'b', 'c']
        while offset < len(input):
            if len(input[offset:offset+2]) == 2:
                bit = input[offset:offset+2]    #Get two digits at a time
                bit = bit[::-1]                 #Reverse them
                #Check if *, #, a, b or c
                if any(x in bit for x in matches):
                    new_bit = ''
                    new_bit = new_bit + str(TBCD_special_chars(bit[0]))
                    new_bit = new_bit + str(TBCD_special_chars(bit[1]))    
                    bit = str(int(new_bit, 2))
                output = output + bit
                offset = offset + 2
            else:
                bit = "f" + str(input[offset:offset+2])
                output = output + bit
                print("TBCD_encode output value is " + str(output))
                return output
    

    def TBCD_decode(self, input):
        print("TBCD_decode Input value is " + str(input))
        offset = 0
        output = ''
        while offset < len(input):
            if "f" not in input[offset:offset+2]:
                bit = input[offset:offset+2]    #Get two digits at a time
                bit = bit[::-1]                 #Reverse them
                output = output + bit
                offset = offset + 2
            else:   #If f in bit strip it
                bit = input[offset:offset+2]
                output = output + bit[1]
                print("TBCD_decode output value is " + str(output))
                return output

The PLMN Problem for Private LTE / 5G

So it’s the not to distant future and the pundits vision of private LTE and 5G Networks was proved correct, and private networks are plentiful.

But what PLMN do they use?

The PLMN (Public Land Mobile Network) ID is made up of a Mobile Country Code + Mobile Network Code. MCCs are 3 digits and MNCs are 2-3 digits. It’s how your phone knows to connect to a tower belonging to your carrier, and not one of their competitors.

For example in Australia (Mobile Country Code 505) the three operators each have their own MCC. Telstra as the first licenced Mobile Network were assigned 505/01, Optus got 505/02 and VHA / TPG got 505/03.

Each carrier was assigned a PLMN when they started operating their network. But the problem is, there’s not much space in this range.

The PLMN can be thought of as the SSID in WiFi terms, but with a restriction as to the size of the pool available for PLMNs, we’re facing an IPv4 exhaustion problem from the start if we’re facing an explosion of growth in the space.

Let’s look at some ways this could be approached.

Everyone gets a PLMN

If every private network were to be assigned a PLMN, we’d very quickly run out of space in the range. Best case you’ve got 3 digits, so only space for 1,000 networks.

In certain countries this might work, but in other areas these PLMNs may get gobbled up fast, and when they do, there’s no more. New operators will be locked out of the market.

Loaner PLMNs

Carriers already have their own PLMNs, they’ve been using for years, some kit vendors have been assigned their own as well.

If you’re buying a private network from an existing carrier, they may permit you to use their PLMN,

Or if you’re buying kit from an existing vendor you may be able to use their PLMN too.

But what happens then if you want to move to a different kit vendor or another service provider? Do you have to rebuild your towers, reconfigure your SIMs?

Are you contractually allowed to continue using the PLMN of a third party like a hardware vendor, even if you’re no longer purchasing hardware from them? What happens if they change their mind and no longer want others to use their PLMN?

Everyone uses 999 / 99

The ITU have tried to preempt this problem by reallocating 999/99 for use in Private Networks.

The problem here is if you’ve got multiple private networks in close proximity, especially if you’re using CBRS or in close proximity to other networks, you may find your devices attempting to attach to another network with the same PLMN but that isn’t part of your network,

Mobile Country or Geographical Area Codes
Note from TSB
Following the agreement on the Appendix to Recommendation ITU-T E.212 on “shared E.212 MCC 999 for internal use within a private network” at the closing plenary of ITU-T SG2 meeting of 4 to 13 July 2018, upon the advice of ITU-T Study Group 2, the Director of TSB has assigned the Mobile Country Code (MCC) “999” for internal use within a private network. 

Mobile Network Codes (MNCs) under this MCC are not subject to assignment and therefore may not be globally unique. No interaction with ITU is required for using a MNC value under this MCC for internal use within a private network. Any MNC value under this MCC used in a network has
significance only within that network. 

The MNCs under this MCC are not routable between networks. The MNCs under this MCC shall not be used for roaming. For purposes of testing and examples using this MCC, it is encouraged to use MNC value 99 or 999. MNCs under this MCC cannot be used outside of the network for which they apply. MNCs under this MCC may be 2- or 3-digit.

(Recommendation ITU-T E.212 (09/2016))

The Crystal Ball?

My bet is we’ll see the ITU allocate an MCC – or a range of MCCs – for private networks, allowing for a pool of PLMNs to use.

When deploying networks, Private network operators can try and pick something that’s not in use at the area from a pool of a few thousand options.

The major problem here is that there still won’t be an easy way to identify the operator of a particular network; the SPN is local only to the SIM and the Network Name is only present in the NAS messaging on an attach, and only after authentication.

If you’ve got a problem network, there’s no easy way to identify who’s operating it.

But as eSIMs become more prevalent and BIP / RFM on SIMs will hopefully allow operators to shift PLMNs without too much headache.

SCTP Parameter Tuning

There’s a lot to like about SCTP. No head of line blocking, MTU issues, sequenced, acknowledged delivery of messages, not to mention Multi-Homing and message bundling.

But if you really want to get the most bang for your buck, you’ll need to tune your SCTP parameters to match the network conditions.

While tuning the parameters per-association would be time consuming, most SCTP stacks allow you to set templates for SCTP parameters, for example you would have a different set of parameters for the SCTP stacks inside your network, compared to SCTP stacks for say a roaming scenario or across microwave links.

IETF kindly provides a table with their recommended starting values for SCTP parameter tuning:

RTO.Initial3 seconds
RTO.Min1 second
RTO.Max60 seconds
Max.Burst4
RTO.Alpha1/8
RTO.Beta1/4
Valid.Cookie.Life60 seconds
Association.Max.Retrans10 attempts
Path.Max.Retrans5 attempts (per destination address)
Max.Init.Retransmits8 attempts
HB.interval30 seconds
HB.Max.Burst1
IETF – RFC4960: SCTP – Suggested Protocol Parameter Values

But by adjusting the Max Retrans and Retransmission Timeout (RTO) values, we can detect failures on the network more quickly, and reduce the number of packets we’ll loose should we have a failure.

We begin with the engineered round-trip time (RTT) – that is made up of the time it takes to traverse the link, processing time for the remote SCTP stack and time for the response to traverse the link again. For the examples below we’ll take an imaginary engineered RTT of 200ms.

RTO.min is the minimum retransmission timeout.
If this value is set too low then before the other side has had time to receive the request, process it and send a response, we’ve already retransmitted it.

This should be set to the round trip delay plus processing needed to send and acknowledge a packet plus some allowance for variability due to jitter; a value of 1.15 times the Engineered RTT is often chosen

So for us, 200 * 1.15 = 230ms RTO.min value.

RTO.max is the maximum amount of time we should wait before transmitting a request.
Typically three times the Engineered RTT.

So for us, 200 * 3 = 600ms RTO.min value.

Path.Max.Retransmissions is the maximum number of retransmissions to be sent down a path before the path is considered to be failed.
For example if we loose a transmission path on a multi-homed server, how many retransmissions along that path should we send until we consider it to be down?

Values set are dependant on if you’re multi-homing or not (you can be more picky if you are) and the level of acceptable packet loss in your transmission link.

Typical values are 4 Retransmissions (per destination address) for a Single-Homed association, and 2 Retransmissions (per destination address) for a Multi-Homed association.

Association.Max.Retransmissions is the maximum number of retransmissions for an association. If a transmission link in a multi-homed SCTP scenario were to go down, we would pass the Path.Max.Retransmissions value and the SCTP stack would stop sending traffic out that path, and try another, but what if the remote side is down? In that scenario all our paths would fail, so we need another counter – Path.Max.Retransmissions to count the total number of retransmissions to an association / destination. When the Association.Max.Retransmissions is reached the association is considered down.

In practice this value would be the number of paths, multiplied by the Path.Max.Retransmissions.

Pre-5G Network Slicing

Network Slicing, is a new 5G Technology. Or is it?

Pre 3GPP Release 16 the capability to “Slice” a network already existed, in fact the functionality was introduced way back at the advent of GPRS, so what is so new about 5G’s Network Slicing?

Network Slice: A logical network that provides specific network capabilities and network characteristics

3GPP TS 123 501 / 3 Definitions and Abbreviations

Let’s look at the old and the new ways, of slicing up networks, pre release 16, on LTE, UMTS and GSM.

Old Ways: APN Separation

The APN or “Access Point Name” is used so the SGSN / MME knows which gateway to that subscriber’s traffic should be terminated on when setting up the session.

APN separation is used heavily by MVNOs where the MVNO operates their own P-GW / GGSN.
This allows the MNVO can handle their own rating / billing / subscriber management when it comes to data.
A network operator just needs to setup their SGSN / MME to point all requests to setup a bearer on the MVNO’s APN to the MNVO’s gateways, and presoto, it’s no longer their problem.

Later as customers wanted MPLS solutions extended over mobile (Typically LTE), MNOs were able to offer “private APNs”.
An enterprise could be allocated an APN by the MNO that would ensure traffic on that APN would be routed into the enterprise’s MPLS VRF.
The MNO handles the P-GW / GGSN side of things, adding the APN configuration onto it and ensuring the traffic on that APN is routed into the enterprise’s VRF.

Different QCI values can be assigned to each APN, to allow some to have higher priority than others, but by slicing at an APN level you lock all traffic to those QoS characteristics (Typically mobile devices only support one primary APN used for routing all traffic), and don’t have the flexibility to steer which networks which traffic from a subscriber goes to.

It’s not really practical for everyone to have their own APNs, due in part to the namespace limitations, the architecture of how this is usually done limits this, and the simple fact of everyone having to populate an APN unique to them would be a real headache.

5G replaces APNs with “DNNs” – Data Network Names, but the functionality is otherwise the same.

In Summary:
APN separation slices all traffic from a subscriber using a special APN and provide a bearer with QoS/QCI values set for that APN, but does not allow granular slicing of individual traffic flows, it’s an all-or-nothing approach and all traffic in the APN is treated equally.

The old Ways: Dedicated Bearers

Dedicated bearers allow traffic matching a set rule to be provided a lower QCI value than the default bearer. This allows certain traffic to/from a UE to use GBR or Non-GBR bearers for traffic matching the rule.

The rule itself is known as a “TFT” (Traffic Flow Template) and is made up of a 5 value Tuple consisting of IP Source, IP Destination, Source Port, Destination Port & Protocol Number. Both the UE and core network need to be aware of these TFTs, so the traffic matching the TFT can get the QCI allocated to it.

This can be done a variety of different ways, in LTE this ranges from rules defined in a PCRF or an external interface like those of an IMS network using the Rx interface to request a dedicated bearers matching the specified TFTs via the PCRF.

Unlike with 5G network slicing, dedicated bearers still traverse the same network elements, the same MME, S-GW & P-GW is used for this traffic. This means you can’t “locally break out” certain traffic.

In Summary:
Dedicated bearers allow you to treat certain traffic to/from subscribers with different precedence & priority, but the traffic still takes the same path to it’s ultimate destination.

Old Ways: MOCN

Multi-Operator Core Network (MOCN) allows multiple MNOs to share the same active (tower) infrastructure.

This means one eNodeB can broadcast more than one PLMN and server more than one mobile network.

This slicing is very coarse – it allows two operators to share the same eNodeBs, but going beyond a handful of PLMNs on one eNB isn’t practical, and the PLMN space is quite limited (1000 PLMNs per country code max).

In Summary:
MOCN allows slicing of the RAN on a very coarse level, to slice traffic from different operators/PLMNs sharing the same RAN.

Its use is focused on sharing RAN rather than slicing traffic for users.

Diameter Droplets – The Flow-Description AVP and IPFilterRules

When it comes to setting up dedicated bearers, the Flow-Description AVP is perhaps the most important,

The specially encoded string (IPFilterRule) in the FlowDescription AVP is what our P-GW (Ok, our PCEF) uses to create Traffic Flow Templates to steer certain types of traffic down Dedicated Bearers.

So let’s take a look at how we can lovingly craft an artisanal Flow-Description.

The contents of the AVP are technically not a string, but a IPFilterRule.

IPFilterRules are actually defined in the Diameter Base Protocol (IETF RFC 6733), where we can learn the basics of encoding them,

Which are in turn based loosely off the ipfw utility in BSD.

They take the format:

action dir proto from src to dst

The action is fairly simple, for all our Dedicated Bearer needs, and the Flow-Description AVP, the action is going to be permit. We’re not blocking here.

The direction (dir) in our case is either in or out, from the perspective of the UE.

Next up is the protocol number (proto), as defined by IANA, but chances are you’ll be using 17 (UDP) or 6 (TCP) in most scenarios.

The from value is followed by an IP address with an optional subnet mask in CIDR format, for example from 10.45.0.0/16 would match everything in the 10.45.0.0/16 network.
Following from you can also specify the port you want the rule to apply to, or, a range of ports,
For example to match a single port you could use 10.45.0.0/16 1234 to match anything on port 1234, but we can also specify ranges of ports like 10.45.0.0/16 0 – 4069 or even mix and match lists and single ports, like 10.45.0.0/16 5060, 1000-2000

Protip: using any is the same as 0.0.0.0/0

Like the from, the to is encoded in the same way, with either a single IP, or a subnet, and optional ports specified.

And that’s it!

Keep in mind that Flow-Descriptions are typically sent in pairs as a minimum, as you want to match the traffic into and out of the network (not just one way), but often there can be quite a few sent, in order to match all the possible traffic that needs to be matched that may be across multiple different subnets, etc.

There is an optional Options parameter that allows you to set things like to only apply the rule to open TCP sessions, fragmentation, etc, although I’ve not seen this implemented in the wild.

Example IP filter Rules

permit in 6 from 10.98.254.0/24 5061 to 10.98.0.0/24 5060
permit out 6 from 10.98.254.0/24 5060 to 10.98.0.0/24 5061

permit in 6 from any 80 to 172.16.1.1 80
permit out 6 from 172.16.1.1 80 to any 80

permit in 17 from 10.98.254.0/24 50000-60100 to 10.98.0.0/24 50000-60100
permit out 17 from 10.98.254.0/24 50000-60100 to 10.98.0.0/24 50000-60100

permit in 17 from 10.98.254.0/24 5061, 5064 to 10.98.0.0/24  5061, 5064
permit out 17 from 10.98.254.0/24 5061, 5064 to 10.98.0.0/24  5061, 5064

permit in 17 from 172.16.0.0/16 50000-60100, 5061, 5064 to 172.16.0.0/16  50000-60100, 5061, 5064
permit out 17 from 172.16.0.0/16 50000-60100, 5061, 5064 to 172.16.0.0/16  50000-60100, 5061, 5064

For more info see:

RFC 6773 – Diameter Base Protocol – IP Filter Rule

3GPP TS 29.214 section 5.3.8 Flow-Description AVP

The Surprisingly Complicated world of MO SMS in IMS/VoLTE

Since the beginning of time, SIP has used the 2xx responses to confirm all went OK.

If you thought sending an SMS in a VoLTE/IMS network would see a 2xx OK response and then that’s the end of it, you’d be wrong.

So let’s take a look into sending SMS over VoLTE/IMS networks!

So our story starts with the Subscriber sending an SMS, which generate a SIP MESSAGE.

The Content-Type of this SIP MESSAGE is set to application/vnd.3gpp.sms rather than Text, and that’s because SMS over IMS uses the Short Message Transfer Protocol (SM-TP) inherited from GSM.

The Short Message Transfer Protocol (SM-TP) (Not related to Simple Message Transfer Protocol used in Email clients) is made up of Transfer Protocol Data Units (TPDU) that contain our message information, even though we have the Destination in our SIP headers, it’s again defined in the SM-TP body.

At first this may seem like a bit of duplication, but this allows older SMS Switching Centers (SMSc) to add support for IMS networks without any major changes, just what the SM-TP payload is wrapped up in changes.

SIP MESSAGE Request Body encoded in SM-TP

So back to our SIP MESSAGE request, typed out by the Subscriber, the UE sends this a SIP MESSAGE onto our IMS Network.

The IMS network follows it’s IFCs and routing rules, and makes it to the termination points for SMS traffic – the SMSc.

The SMSc sends back either a 200 OK or a 202 Accepted, and you’d think that’s the end of it, but no.

Our Subscriber still sees “Sending” on the screen, and the SMS is not shown as sent yet.

Instead, when the SMS has been delivered or buffered, relayed, etc, the SMSc generates a new SIP request, (as in new Call-ID / Dialog) with the request type MESSAGE, addressed to the Subscriber.

The payload of this request is another application/vnd.3gpp.sms encoded request body, again, containing SM-TP encoded data.

When the UE receives this, it will then consider the message delivered.

SM-TP encoded Delivery Report

Of course things change slightly when delivery reports are enabled, but that’s another story!

5Gethernet? – Transporting Non-IP data in 5G

I wrote not too long ago about how LTE access is not liked WiFi, after a lot of confusion amongst new Open5Gs users coming to LTE for the first time and expecting it to act like a Layer 2 network.

But 5G brings a new feature that changes that;

PDU Session Type: The type of PDU Session which can be IPv4, IPv6, IPv4v6, Ethernet or Unstructured

ETSI TS 123 501 – System Architecture for the 5G System

No longer are we limited to just IP transport, meaning at long last I can transport my Token Ring traffic over 5G, or in reality, customers can extend Layer 2 networks (Ethernet) over 3GPP technologies, without resorting to overlay networking, and much more importantly, fixed line networks, typically run at Layer 2, can leverage the 5G core architecture.

How does this work?

With TFTs and the N6 interfaces relying on the 5 value tuple with IPs/Ports/Protocol #s to make decisions, transporting Ethernet or Non-IP Data over 5G networks presents a problem.

But with fixed (aka Wireline) networks being able to leverage the 5G core (“Wireline Convergence”), we need a mechanism to handle Ethernet.

For starters in the PDU Session Establishment Request the UE indicates which PDN types, historically this was IPv4/6, but now if supported by the UE, Ethernet or Unstructured are available as PDU types.

We’ll focus on Ethernet as that’s the most defined so far,

Once an Ethernet PDU session has been setup, the N6 interface looks a bit different, for starters how does it know where, or how, to route unstructured traffic?

As far as 3GPP is concerned, that’s your problem:

Regardless of addressing scheme used from the UPF to the DN, the UPF shall be able to map the address used between the UPF and the DN to the PDU Session.

5.6.10.3 Support of Unstructured PDU Session type

In short, the UPF will need to be able to make the routing decisions to support this, and that’s up to the implementer of the UPF.

In the Ethernet scenario, the UPF would need to learn the MAC addresses behind the UE, handle ARP and use this to determine which traffic to send to which UE, encapsulate it into trusty old GTP, fill in the correct TEID and then send it to the gNodeB serving that user (if they are indeed on a RAN not a fixed network).

So where does this leave QoS? Without IPs to apply with TFTs and Packet Filter Sets to, how is this handled? In short, it’s not – Only the default QoS rule exist for a PDU Session of Type Unstructured. The QoS control for Unstructured PDUs is performed at the PDU Session level, meaning you can set the QFI when the PDU session is set up, but not based on traffic through that bearer.

Does this mean 5G RAN can transport Ethernet?

Well, it remains to be seen.

The specifications don’t cover if this is just for wireline scenarios or if it can be used on RAN.

The 5G PDU Creation signaling has a field to indicate if the traffic is Ethernet, but to work over a RAN we would need UE support as well as support on the Core.

And for E-UTRAN?

For the foreseeable future we’re going to be relying on LTE/E-UTRAN as well as 5G. So if you’re mobile with a non-IP PDU, and you enter an area only served by LTE, what happens?

PDU Session types “Ethernet” and “Unstructured” are transferred to EPC as “non-IP” PDN type (when supported by UE and network).

It is assumed that if a UE supports Ethernet PDU Session type and/or Unstructured PDU Session type in 5GS it will also support non-IP PDN type in EPS.

5.17.2 Interworking with EPC

If you were not aware of support in the EPC for Non-IP PDNs, I don’t blame you – So far support the CIoT EPS optimizations were initially for Non-IP PDN type has been for NB-IoT to supporting Non-IP Data Delivery (NIDD) for lightweight LwM2M traffic.

So why is this? Well, it may have to do with WO 2017/032399 Al which is a patent held by Ericsson, regarding “COMMUNICATION OF NON-IP DATA OVER PACKET DATA NETWORKS” which may be restricting wide scale deployment of this,

Diff + Wireshark

Supposedly Archimedes once said:

Give me a lever long enough and a fulcrum on which to place it, and I shall move the world.

For me, the equivalent would be:

Give me a packet capture of the problem occurring and a standards document against which to compare it, and I shall debug the networking world.

And if you’re like me, there’s a good chance when things are really not going your way, you roll up your sleeves, break out Wireshark and the standards docs and begin the painstaking process of trying to work out what’s not right.

Today’s problem involved a side by side comparison between a pcap of a known good call, and one which is failing, so I just had to compare the two, which is slow and fairly error-prone,

So I started looking for something to diff PCAPs easily. The data I was working with was ASN.1 encoded so I couldn’t export as text like you can with HTTP or SIP based protocols and compare it that way.

In the end I stumbled across something even better to compare frames from packet captures side by side, with the decoding intact!

Turns out yo ucan copy the values including decoding from within Wireshark, which means you can then just paste the contents into a diff tool (I’m using the fabulous Meld on Linux, but any diff tool will do including diff itself) and off you go, side-by-side comparison.

Select the first packet/frame you’re interested in (or even just the section), expand the subkeys, right click, copy “All Visible items”. This copy contains all the decoded data, not just the raw bytes, which is what makes it so great.

Next paste it into your diff tool of choice, repeat with the one to compare against, scroll past the data you know is going to be different (session IDs, IPs, etc) and ta-da, there’s the differences.

Video of the whole process below:

MSISDN Encoding - Brought to you by the letter F

MSISDN Encoding in Diameter AVPs – Brought to you by the letter F

So this one knocked me for six the other day,

MSISDN AVP 700 / vendor ID 10415, used to advertise the subscriber’s MSISDN in signaling.

I formatted the data as an Octet String, with the MSISDN from the database and moved on my merry way.

Not so fast…

The MSISDN AVP is of type OctetString.

This AVP contains an MSISDN, in international number format as described in ITU-T Rec E.164 [8], encoded as a TBCD-string, i.e. digits from 0 through 9 are encoded 0000 to 1001;

1111 is used as a filler when there is an odd number of digits; bits 8 to 5 of octet n encode digit 2n; bits 4 to 1 of octet n encode digit 2(n-1)+1.

ETSI TS 129 329 / 6.3.2 MSISDN AVP

Come again?

In practice this means if you have an odd lengthed MSISDN value, we need to add some padding to round it out to an even-lengthed value.

This padding happens between the last and second last digit of the MSISDN (because if we added it at the start we’d break the Country Code, etc) and as MSISDNs are variable length subscriber numbers.

1111 in octet string is best known as the letter F,

Not that complicated, just kind of confusing.

And the call was coming from… INSIDE THE HOUSE. A look at finding UE Locations in LTE

Opening Tirade

Ok, admittedly I haven’t actually seen “When a Stranger Calls”, or the less popular sequel “When a stranger Redials” (Ok may have made the last one up).

But the premise (as I read Wikipedia) is that the babysitter gets the call on the landline, and the police trace the call as originating from the landline.

But you can’t phone yourself, that’s not how local loops work – When the murderer goes off hook it loops the circuit, which busys it. You could apply ring current to the line I guess externally but unless our murder has a Ring generator or has setup a PBX inside the house, the call probably isn’t coming from inside the house.

On Topic – The GMLC

The GMLC (Gateway Mobile Location Centre) is a central server that’s used to locate subscribers within the network on different RATs (GSM/UMTS/LTE/NR).

The GMLC typically has interfaces to each of the radio access technologies, there is a link between the GMLC and the CS network elements (used for GSM/UMTS) such as the HLR, MSC & SGSN via Lh & Lg interfaces, and a link to the PS network elements (LTE/NR) via Diameter based SLh and SLg interfaces with the MME and HSS.

The GMLC’s tentacles run out to each of these network elements so it can query them as to a subscriber’s location,

LTE Call Flow

To find a subscriber’s location in LTE Diameter based signaling is used, to query the MME which in turn queries, the eNodeB to find the location.

But which MME to query?

The SLh Diameter interface is used to query the HSS to find out which MME is serving a particular Subscriber (identified by IMSI or MSISDN).

The LCS-Routing-Info-Request is sent by the GMLC to the HSS with the subscriber identifier, and the LCS-Routing-Info-Response is returned by the HSS to the GMLC with the details of the MME serving the subscriber.

Now we’ve got the serving MME, we can use the SLg Diameter interface to query the MME to the location of that particular subscriber.

The MME can report locations to the GMLC periodically, or the GMLC can request the MME provide a location at that point.
For the GMLC to request a subscriber’s current location a Provide-Location-Request is set by the GMLC to the MME with the subscriber’s IMSI, and the MME responds after querying the eNodeB and optionally the UE, with the location info in the Provide-Location-Response.

(I’m in the process of adding support for these interfaces to PyHSS and all going well will release some software shortly to act at a GMLC so people can use this.)

Finding the actual Location

There are a few different ways the actual location of the UE is determined,

At the most basic level, Cell Global Identity (CGI) gives the identity of the eNodeB serving a user.
If you’ve got a 3 sector site each sector typically has its own Cell Global Identity, so you can determine to a certain extent, with the known radiation pattern, bearing and location of the sector, in which direction a subscriber is. This happens on the network side and doesn’t require any input from the UE.
But if we query the UE’s signal strength, this can then be combined with existing RF models and the signal strength reported by the UE to further pinpoint the user with a bit more accuracy. (Uplink and downlink cell coverage based positioning methods)
Barometric pressure and humidity can also be reported by the base station as these factors will impact resulting signal strengths.

Timing Advance (TA) and Time of Arrival (TOA) both rely on timing signals to/from a UE to determine it’s distance from the eNodeB. If the UE is only served by a single cell this gives you a distance from the cell and potentially an angle inside which the subscriber is. This becomes far more useful with 3 or more eNodeBs in working range of the UE, where you can “triangulate” the UE’s location. This part happens on the network side with no interaction with the UE.
If the UE supports it, EUTRAN can uses Enhanced Observed Time Difference (E-OTD) positioning method, which does TOD calcuation does this in conjunction with the UE.

GPS Assisted (A-GPS) positioning gives good accuracy but requires the devices to get it’s current location using the GPS, which isn’t part of the baseband typically, so isn’t commonly implimented.

Uplink Time Difference of Arrival (UTDOA) can also be used, which is done by the network.

So why do we need to get Subscriber Locations?

The first (and most noble) use case that springs to mind is finding the location of a subscriber making a call to emergency services. Often upon calling an emergency services number the GMLC is triggered to get the subscriber’s location in case the call is cut off, battery dies, etc.

But GMLCs can also be used for lots of other purposes, marketing purposes (track a user’s location and send targeted ads), surveillance (track movements of people) and network analytics (look at subscriber movement / behavior in a specific area for capacity planning).

Different countries have different laws regulating access to the subscriber location functions.

Hack to disable Location Reporting on Mobile Networks

If you’re wondering how you can disable this functionality, you can try the below hack to ensure that your phone does not report your location.

  1. Press the power button on your phone
  2. Turn it off

In reality, no magic super stealth SIM cards, special phones or fancy firmware will prevent the GMLC from finding your location.
So far none of the “privacy” products I’ve looked at have actually done anything special at the Baseband level. Most are just snakeoil.

For as long as your device is connected to the network, the passive ways of determining location, such as Uplink Time Difference of Arrival (UTDOA) and the CGI are going to report your location.

Dr StrangeEncoding or: How I learned to stop worrying and love ASN.1

Australia is a strange country; As a kid I was scared of dogs, and in response, our family got a dog.

This year started off with adventures working with ASN.1 encoded data, and after a week of banging my head against the table, I was scared of ASN.1 encoding.

But now I love dogs, and slowly, I’m learning to embrace ASN.1 encoding.

What is ASN.1?

ASN.1 is an encoding scheme.

The best analogy I can give is to image a sheet of paper with a form on it, the form has fields for all the different bits of data it needs,

Each of the fields on the form has a data type, and the box is sized to restrict input, and some fields are mandatory.

Now imagine you take this form and cut a hole where each of the text boxes would be.

We’ve made a key that can be laid on top of a blank sheet of paper, then we can fill the details through the key onto the blank paper and reuse the key over and over again to fill the data out many times.

When we remove the key off the top of our paper, and what we have left on the paper below is the data from the form. Without the key on top this data doesn’t make much sense, but we can always add the key back and presto it’s back to making sense.

While this may seem kind of pointless let’s look at the advantages of this method;

The data is validated by the key – People can’t put a name wherever, and country code anywhere, it’s got to be structured as per our requirements. And if we tried to enter a birthday through the key form onto the paper below, we couldn’t.

The data is as small as can be – Without all the metadata on the key above, such as the name of the field, the paper below contains only the pertinent information, and if a field is left blank it doesn’t take up any space at all on the paper.

It’s these two things, rigidly defined data structures (no room for errors or misinterpretation) and the minimal size on the wire (saves bandwidth), that led to 3GPP selecting ASN.1 encoding for many of it’s protocols, such as S1, NAS, SBc, X2, etc.

It’s also these two things that make ASN.1 kind of a jerk; If the data structure you’re feeding into your ASN.1 compiler does not match it will flat-out refuse to compile, and there’s no way to make sense of the data in its raw form.

I wrote a post covering the very basics of working with ASN.1 in Python here.

But working with a super simple ASN.1 definition you’ve created is one thing, using the 3GPP defined ASN.1 definitions is another,

With the aid of the fantastic PyCrate library, which is where the real magic happens, and this was the nut I cracked this week, compiling a 3GPP ASN.1 definition and communicating a standards-based protocol with it.

Watch this space for more fun with ASN.1!

SCTP Multihoming

One of the key advantages of SCTP over TCP is the support for Multihoming,

From an application perspective, this enables one “socket”, to be shared across multiple IP Addresses, allowing multiple IP paths and physical NICs.

Through multihoming we can protect against failures in IP Routing and physical links, from a transport layer protocol.

So let’s take a look at how this actually works,

For starters there’s a few ways multihoming can be implemented, ideally we have multiple IPs on both ends (“client” and “server”), but this isn’t always achievable, so SCTP supports partial multi-homing, where for example the client has only one IP but can contact the server on multiple IP Addresses, and visa-versa.

The below image (Courtesy of Wikimedia) shows the ideal scenario, where both the client and the server have multiple IPs they can be reached on.

Arkrishna, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons

This would mean a failure of any one of the IP Addresses or the routing between them, would see the other secondary IP Addresses used for Transport, and the application not even necessarily aware of the interruption to the primary IP Path.

The Process

For starters, our SCTP Client/Server will each need to be aware of the IPs that can be used,

This is advertised in the INIT message, sent by the “client” to a “server” when the SCTP session is established.

SCTP INIT sent by the client at 10.0.1.185, but advertising two IPs

In the above screenshot we can see the two IPs for SCTP to use, the primary IP is the first one (10.0.1.185) and also the from IP, and there is just one additional IP (10.0.1.187) although there could be more.

In a production environment you’d want to ensure each of your IPs is in a different subnet, with different paths, hardware and routes.

So the INIT is then responded to by the client with an INIT_ACK, and this time the server advertises it’s IP addresses, the primary IP is the From IP address (10.0.1.252) and there is just one additional IP of 10.0.1.99,

SCTP INIT ACK showing Server’s Multi-homed IP Options

It’s worth noting that according to RFC 4960 Multi-homing is Optional and so is the IP Address Header, if it’s not advertised the sender is single-homed.

Next up we have the cookie exchange, which is used to protect against synchronization attacks, and then our SCTP session is up.

So what happens at this point? How do we know if a path is up and working?

Well the answer is heartbeat messages,

Sent from each of the IPs on the client to each of the IPs on the server, to make sure that there’s a path from every IP, to every other IP.

SCTP Heartbeats from each local IP to each remote IP

This means the SCTP stacks knows if a path fails, for example if the route to IP 10.0.1.252 on the server were to fail, the SCTP stack knows it has another option, 10.0.1.99, which it’s been monitoring.

So that’s multi-homed SCTP in action – While a lot of work has historically been done with LACP for aggregating multiple NICs together, and VRRP for ensuring a host is alive, SCTP handles this in a clean and efficient way.

I’ve attached a PCAP showing multi-homing on a Diameter S6a (HSS) interface between an MME and a HSS.

Getting the GTP-U Packets flowing Fast – DPDK & SR-IOV

So dedicated appliances are dead and all our network functions are VMs or Containers, but there’s a performance hit when going virtual as the L2 processing has to be handled by the Hypervisor before being passed onto the relevant VM / Container.

If we have a 10Gb NIC in our server, we want to achieve a 10Gbps “Line Speed” on the Network Element / VNF we’re running on.

When we talked about appliances if you purchased an P-GW with 10Gbps NIC, it was a given you could get 10Gbps through it (without DPI, etc), but when we talk about virtualized network functions / network elements there’s a very real chance you won’t achieve the “line speed” of your interfaces without some help.

When you’ve got a Network Element like a S-GW, P-GW or UPF, you want to forward packets as quickly as possible – bottlenecks here would impact the user’s achievable speeds on the network.

To speed things up there are two technologies, that if supported by your software stack and hardware, allows you to significantly increase throughput on network interfaces, DPDK & SR-IOV.

DPDK – Data Plane Development Kit

Usually *Nix OSs handle packet processing on the Kernel level. As I type this the packets being sent to this WordPress server by Firefox are being handled by the Linux 5.8.0-36-generic kernel running on my machine.

The problem is the kernel has other things to do (interrupts), meaning increased delay in processing (due to waiting for processing capability) and decreased capacity.

DPDK shunts this processing to the “user space” meaning your application (the actual magic of the VNF / Network Element) controls it.

To go back to me writing this – If Firefox and my laptop supported DPDK, then the packets wouldn’t traverse the Linux kernel at all, and Firefox would be talking directly to my NIC. (Obviously this isn’t the case…)

So DPDK increases network performance by shifting the processing of packets to the application, bypassing the kernel altogether. You are still limited by the CPU and Memory available, but with enough of each you should reach very near to line speed.

SR-IOV – Single Root Input Output Virtualization

Going back to the me writing this analogy I’m running Linux on my laptop, but let’s imagine I’m running a VM running Firefox under Linux to write this.

If that’s the case then we have an even more convolted packet processing chain!

I type the post into Firefox which sends the packets to the Linux kernel, which waits to be scheduled resources by the hypervisor, which then process the packets in the hypervisor kernel before finally making it onto the NIC.

We could add DPDK which skips some of these steps, but we’d still have the bottleneck of the hypervisor.

With PCIe passthrough we could pass the NIC directly to the VM running the Firefox browser window I’m typing this, but then we have a problem, no other VMs can access these resources.

SR-IOV provides an interface to passthrough PCIe to VMs by slicing the PCIe interface up and then passing it through.

My VM would be able to access the PCIe side of the NIC, but so would other VMs.

So that’s the short of it, SR-IOR and DPDK enable better packet forwarding speeds on VNFs.

Information stored on USIM / SIM Card for LTE / EUTRAN / EPC - K key, OP/OPc key and SQN Sequence Number

Confidentiality Algorithms in 3GPP Networks: MILENAGE, XOR & Comp128

We’ve covered a fair bit on authentication in 3GPP networks, SIM cards, HSS / AuC, etc, but never actually looked at the Confidentiality Algorithms in use,

LTE USIM Authentication - Mutual Authentication of the Network and Subscriber

While we’ve already covered the inputs required by the authentication elements of the core network (The HSS in LTE/4G, the AuC in UMTS/3G and the AUSF in 5G) to generate an output, it’s worth noting that the Confidentiality Algorithms used in the process determines the output.

This means the Authentication Vector (Also known as an F1 and F1*) generated for a subscriber using Milenage Confidentiality Algorithms will generate a different output to that of Confidentiality Algorithms XOR or Comp128.

To put it another way – given the same input of K key, OPc Key (or OP key), SQN & RAND (Random) a run with Milenage (F1 and F1* algorithm) would yield totally different result (AUTN & XRES) to the same inputs run with a simple XOR.

Technically, as operators control the network element that generates the challenges, and the USIM that responds to them, it is an option for an operator to implement their own Confidentiality Algorithms (Beyond just Milenage or XOR) so long as it produced the same number of outputs. But rolling your own cryptographic anything is almost always a terrible idea.

So what are the differences between the Confidentiality Algorithms and which one to use?
Spoiler alert, the answer is Milenage.

Milenage

Milenage is based on AES (Originally called Rijndael) and is (compared to a lot of other crypto implimentations) fairly easy to understand,

AES is very well studied and understood and unlike Comp128 variants, is open for anyone to study/analyse/break, although AES is not without shortcomings, it’s problems are at this stage, fairly well understood and mitigated.

There are a few clean open source examples of Milenage implementations, such as this C example from FreeBSD.

XOR

It took me a while to find the specifications for the XOR algorithm – it turns out XOR is available as an alternate to Milenage available on some SIM cards for testing only, and the mechanism for XOR Confidentiality Algorithm is only employed in testing scenarios, not designed for production.

Instead of using AES under the hood like Milenage, it’s just plan old XOR of the keys.

Osmocom have an implementation of this in their CN code, you can find here.

Defined under 3GPP TS 34.108 8.1.2.1

Comp128

Comp128 was originally a closed source algorithm, with the maths behind it not publicly available to scrutinise. It is used in GSM A3 and A5 functions, akin to the F1 and F1* in later releases.

Due to its secretive nature it wasn’t able to be studied or analysed prior to deployment, with the idea that if you never said how your crypto worked no one would be able to break it. Spoiler alert; public weaknesses became exposed as far back as 1998, which led to Toll Fraud, SIM cloning and eventually the development of two additional variants, with the original Comp128 renamed Comp128-1, and Comp128-2 (stronger algorithm than the original addressing a few of its flaws) and Comp128-3 (Same as Comp128-2 but with a 64 bit long key generated).

Into the Future & 5G later releases

As options beyond just USIM authentication become available for authentication in 5G SA networks, additional algorithms can be used beyond EAP and AKA, but at the time of writing only TLS has been added. 5G adds SUCI and SUPI which provide a mechanism to keep the private identifier (IMSI) away from prying eyes (or antenna), which I’ve detailed in this post.

Power cables feeding Ericsson RBS rack

Cell Broadcast in LTE

Recently I’ve been wrapping my head around Cell Broadcast in LTE, and thought I’d share my notes on 3GPP TS 38.413.

The interface between the MME and the Cell Broadcast Center (CBC) is the SBc interface, which as two types of “Elementary Procedures”:

  • Class 1 Procedures are of the request – response nature (Request followed by a Success or Failure response)
  • Class 2 Procedures do not get a response, and are informational one-way. (Acked by SCTP but not an additional SBc message).

SCTP is used as the transport layer, with the CBC establishing a point to point connection to the MME over SCTP (Unicast only) on port 29168 with SCTP Payload Protocol Identifier 24.

The SCTP associations between the MME and the CBC should normally remain up – meaning the SCTP association / transport connection is up all the time, and not just brought up when needed.

Elementary Procedures

Write-Replace Warning (Class 1 Procedure)

The purpose of Write-Replace Warning procedure is to start, overwrite the broadcasting of warning message, as defined in 3GPP TS 23.041 [14].

Write-Replace Warning procedure, initiated by WRITE-REPLACE WARNING REQUEST sent by the CBC to the MMEs contains the emergency message to be broadcast and the parameters such as TAC to broadcast to, severity level, etc.

A WRITE-REPLACE WARNING RESPONSE is sent back by the MME to the MME, if successful, along with information as to where it was sent out. CBC messages are unacknowledged by UEs, meaning it’s not possible to confirm if a UE has actually received the message.

The request includes the message identifier and serial number, list of TAIs, repetition period, number of broadcasts requested, warning type, and of course, the warning message contents.

Stop Warning Procedure (Class 1 Procedure)

Stop Warning Procedure, initiated by STOP WARNING REQUEST and answered with a STOP WARNING RESPONSE, requests the MME inform the eNodeBs to stop broadcasting the CBC in their SIBs.

Includes TAIs of cells this should apply to and the message identifier,

Error Indication (Class 2)

The ERROR INDICATION is used to indicate an error (duh). Contains a Cause and Criticality IEs and can be sent by the MME or CBC.

Write Replace Warning (Class 2)

The WRITE REPLACE WARNING INDICATION is used to indicate warning scenarios for some instead of a WRITE-REPLACE WARNING RESPONSE,

PWS Restart (Class 2)

The PWS RESTART INDICATION is used to list the eNodeBs / cells, that have become available or have restarted, since the CBC message and have no warning message data – for example eNodeBs that have just come back online during the period when all the other cells are sending Cell Broadcast messages.

Returns a the Restarted-Cell-List IE, containing the Global eNB ID IE and List of TAI, of the restarted / reconnected cells.

PWS Failure Indication (Class 2)

The PWS FAILURE INDICATION is essentially the reverse of PWS RESTART INDICATION, indicating which eNodeBs are no longer available. These cells may continue to send Cell Broadcast messages as the MME has essentially not been able to tell it to stop.

Contains a list of Failed cells (eNodeBs) with the Global-eNodeB-ID of each.

5GC: The Network Function Repository Function

The Problem

Mobile networks are designed to be redundant and resilient, with N+1 for everything.

Every network element connects to multiple other network elements.

The idea being the network is architected so a failure of any one network element will not impact service.

To take an LTE/EPC example, your eNodeBs connect to multiple MMEs, which in turn connect to multiple HSSs, multiple S-GWs, multiple EIRs, etc.
The problem is when each eNodeB connects to 3 MMEs, and you want to add a 4th MME, you have to go and reconfigure all the eNodeBs to point to the new MME, and all the HSSs to accept that MME as a new Diameter Peer, for example.

The more redundant you make the network, the harder it becomes to change.

This led to development of network elements like Diameter Routing Agents (DRAs) and DNS SRV for service discovery, but ultimately adding and removing network elements in previous generations of mobile core, involved changing a lot of config on a lot of different boxes.

The Solution

The NRF – Network Repository Function serves as a central repository for Network Functions (NFs) on the network.

In practice this means when you bring a new Network Function / Network Element online, you only need to point it at the NRF, which will tell it about other Network Functions on the network, register the new Network Function and let every other interested Network Function know about the new guy.

Take for example adding a new AMF to the network, after bringing it online the only bit of information the AMF really needs to start placing itself in the network, is the details of the NRF, so it can find everything it needs to know.

Our new AMF will register itself to the NRF, advertising what Network Functions it can offer (ie AMF service), and it’ll in turn be able to learn about what Network Functions it can consume – for example our AMF would need to know about the UDMs it can query data from.

It is one of the really cool design patterns usually seen in modern software, that 3GPP have adopted as part of the 5GC.

In Practice

Let’s go into a bit more detail and look at how it looks.

The NRF uses HTTP and JSON to communicate (anything not using ASN.1 is a winner), and looks familiar to anyone used to dealing with RESTful APIs.

Let’s take a look at how an AMF looks when registering to a NRF,

NF Register – Providing the NRF a profile for each NF

In order for the NRF to function it has to know about the presence of all the Network Functions on the network, and what they support. So when a new Network Function comes online, it’s got to introduce itself to the NRF.

It does this by providing a “Profile” containing information about the Network Functions it supports, IP Addresses, versions, etc.

Going back to our AMF example, the AMF sends a HTTP PUT request to our NRF, with a JSON payload describing the functions and capabilities of the AMF, so other Network Functions will be able to find it.

Let’s take a look at what’s in the JSON payload used for the NF Profile.

  • Each Network Function is identified by a UUID – nfInstanceId, in this example it’s value is “f2b2a934-1b06-41eb-8b8b-cb1a09f099af”
  • The nfType (Network Function type) is an AMF, and it’s IP Address is 10.0.1.7
  • The heartBeatTimer sets how often the network function (in this case AMF) sends messages to the NRF to indicate it’s still alive. This prevents a device registering to an NRF and then going offline, and the NRF not knowing.

The nfServices key contains an array of services and details of those services, in the below example the key feature is the serviceName which is namf-comm which means the Namf_Communication Service offered by the AMF.

The NRF files this info away for anyone who requests it (more on that later) and in response to this our NRF will indicate (hopefully) that it’s successfully created the entry in its internal database of Network Functions for our AMF, resulting in a HTTP 201 “Created” response back from the NRF to the AMF.

NRF StatusSubscribe – Subscribe & Notify

Simply telling the NRF about the presence of NFs is one thing, but it’s not much use if nothing is done with that data.

A Network Function can subscribe to the NRF to get updates when certain types of NFs enter/leave the network.

Subscribing is done by sending a HTTP POST with a JSON payload indicating which NFs we’re interested in.

Contents of a Subscription message to be notified of all AMFs joining the network

Whenever a Network Function registers on the NRF that related to the type that has been subscribed to, a HTTP POST is sent to each subscriber to let them know.

For example when a UDM registers to the network, our AMF gets a Notification with information about the UDM that’s just joined.

NRF Update – Updating NRF Profiles & Heartbeat

If our AMF wants to update its profile in the NRF – for example a new IP is added to our AMF, a HTTP PATCH request is sent with a JSON payload with the updated details, to the NRF.

The same mechanism is used as the Heartbeat / keepalive mechanism, to indicate the NRF is still there and working.

Summary

The NRF acts as a central repository used for discovery of neighboring network functions.

5GC for EPC Folks – Control Plane Signalling

As the standardisation for 5G-SA has been completed and the first roll outs are happening, I thought I’d cover the basic architecture of the 5G Core Network, for people with a background in EPC/SAE networks for 4G/LTE, covering the key differences, what’s the same and what’s new.

The AMFAuthentication & Mobility Function, serves much the same role as the MME in LTE/EPC/SAE networks.

Like the MME, the AMF only handles Control Plane traffic, and serves as the gatekeeper to the services on the network, connecting the RAN to the core, authenticating subscribers and starting data / PDN connections for the UEs.

While the MME connects to eNodeBs for RAN connectivity, the AMF connects to gNodeBs for RAN.

The Authentication Functions

In EPC the HSS had two functions; it was a database of all subscribers’ profile information and also the authentication centre for generating authentication vectors.

5GC splits this back into two network elements (Akin to the AuC and HLR in 2G/3G).

The UDM (Unified Data Management) provides the AMF with the subscriber profile information (allowed / barred services / networks, etc),

The AUSF (Authentication Server Function) provides the AMF with the authentication vectors for authenticating subscribers.

Like in UMTS/LTE USIMs are used to authenticate subscribers when connecting to the network, again using AKA (Authentication and Key Agreement) for mutual subscriber & network authentication.

Other authentication methods may be implemented, R16 defines 3 suporrted methods, 5G-AKA, EAP-AKA’, and EAP-TLS.

This opens the door for the 5GC to be used for non-mobile usage. There has been early talk of using the 5G architecture for fixed line connectivity as well as mobile, hence supporting a variety of authentication methods beyond classic AKA & USIMs. (For more info about Non-3GPP Access interworking look into the N3IWF)

The Mobility Functions

When a user connects to the network the AMF selects a SMF (Session Management Function) akin to a P-GW-C in EPC CUPS architecture and requests the SMF setup a connection for the UE.

This is similar to the S11 interface in EPC, however there is no S-GW used in 5GC, so would be more like if S11 were instead sent to the P-GW-C.

The SMF selects a UPF (Akin to the P-GW-C selecting a P-GW-U in EPC), which will handle this user’s traffic, as the UPF bridges external data networks (DNs) to the gNodeB serving the UE.

More info on how the UPF functions compared to it’s EPC counterparts can be found in this post.

Moving between cells / gNodeBs is handled in much the same way as done previously, with the path the UPF sends traffic to (N3 interface) updated to point to the IP of the new gNodeB.

Mobility between EPC & 5GC is covered in this post.

Connection Overview

When a UE attempts to connect to the network their signalling traffic (Using the N1 reference point between the UE and the AMF), is sent to the AMF.

an authentication challenge is issued as in previous generations.

Upon successful authentication the AMF signals the SMF to setup a session for the UE. The SMF selects a UPF to handle the user plane forwarding to the gNodeB serving the UE.

Key Differences

  • Functions handled by the MME in EPC now handled by AMF in 5GC
  • Functions of HSS now in two Network Functions – The UDM (Unified Data Management) and AUSF (Authentication Server Function)
  • Setting up data connections “flatter” (more info on the User Plane differences can be found here)
  • Non 3GPP access (Potentially used for fixed-line / non mobile networks)

See also: 5GC for EPC Folks – User Plane Traffic

5GC for EPC Folks – User Plane Traffic

As the standardisation for 5G-SA has been completed and the first roll outs are happening, I thought I’d cover the basic architecture of the 5G Core Network, for people with a background in EPC/SAE networks for 4G/LTE, covering the key differences, what’s the same and what’s new.

This posts focuses on the User Plane side of things, there’s a similar post I’ve written here on the Control Plane side of things.

UPF – User Plane Forwarding

The UPF bridges the external networks (DNs) to the gNodeB serving the UE by encapsulating the traffic into GTP, which has been used in every network since GSM.

Like the P-GW the UPF takes traffic destined to/from external networks and passes it to/from subscribers.

In 5GC these external networks are now referred to as “DN” – Data Networks, instead of by the SGi reference point.

In EPC the Serving-Gateway’s intermediate function of routing traffic to the correct eNB is removed and instead this is all handled by the UPF, along with buffering packets for a subscriber in idle mode.

The idea behind this, is that by removing the S-GW removes extra hops / latency in the network, and allows users to be connected to the best UPF for their needs, typically one located close to the user.

However, there are often scenarios where an intermediate function is required – for example wanting to anchor a session to keep an IP Address allocated to one UPF associated with a user, while they move around the network. In this scenario a UPF can act as an “Session Anchor” (Akin to a P-GW), and pass through “Intermediate UPFs” (Like S-GWs).

Unlike the EPCs architecture, there is no limit to how many I-UPFs can be chained together between the Session Anchoring UPF and the gNB, and this chaining of UPFs allows for some funky routing options.

The UPF is dumb by design. The primary purpose is just to encapsulate traffic destined from external networks to subscribers into GTP-U packets and forward them onto the gNodeB serving that subscriber, and the same in reverse. Do one thing and do it well.

SMF – Session Management Function

So with dumb UPFs we need something smarter to tell them what to do.

Control of the UPFs is handled by the SMF – Session Management Function, which signals using PFCP down to the UPFs to tell them what to do in terms of setting up connections.

While GTP-U is used for handling user traffic, control plane traffic no longer uses GTPv2-C. Instead 5GC uses PFCP – Packet Forwarding Control Protocol. To get everyone warmed up to Control & User Plane separation 3GPP introduced as seen in CUPS into the EPC architecture in Release 14.

This means the interface between the SMF and UPF (the N4 interface) is more or less the same as the interface between a P-GW-C and a P-GW-U seen in CUPS.

When a subscriber connects to the network and has been authenticated, the AMF (For more info on the AMF see the sister post to this topic covering Control Plane traffic) requests the SMF to setup a connection for the subscriber.

Interworking with EPC

For deployments with an EPC and 5GC interworking between the two is of course required.

This is achieved first through the implementation of CUPS (Control & User Plane Separation) on the EPC, specifically splitting the P-GW into a P-GW-C for handing the Control Plane signalling (GTPv2c) and a P-GW-U for the User Plane traffic encapsulated into GTP.

The P-GW-C and P-GW-U communications using PFCP are essentially the same as the N4 interface (between the SMF and the UPF) so the P-GW-U is able to act as a UPF.

This means handovers between the two RATs / Cores is seamless as when moving from an LTE RAT and EPC to a 5G RAT and 5G Core, the same UPF/P-GW-U is used, and only the Control Plane signalling around it changes.

When moving from LTE to 5G RAT, the P-GW-C is replaced by the SMF,
When moving from 5G RAT to LTE, the SMF is replaced by the P-GW-C.
In both scenarios user plane traffic takes the same exit point to external Data Networks (SGi interface in EPC / N6 interface in 5GC).

Interfaces / Reference Points

N3 Interface

N3 interface connects the gNodeB user plane to the UPF, to transport GTP-U packets.

This is a User Plane interface, and only transports user plane traffic.

This is akin to the S1-UP interface in EPC.

N4 Interface

N4 interface connects the Session Management Function (SMF) control plane to the UPF, to setup, modify and delete UPF sessions.

It is a control plane interface, and does not transport User Plane traffic.

This interface relies on PFCP – Packet Forwarding Control Protocol.

This is akin to the SxB interface in EPC with CUPS.

N6 Interface

N6 interface connects the UPF to External Data Networks (DNs), taking packets destined for Subscribers and encapsulating them into GTP-U packets.

This is a User Plane interface, and only transports user plane traffic.

This is akin to the SGi interface in EPC.

N9 Interface

When Session Anchoring is used, and Intermediate-UPFs are used, the connection between these UPFs uses the N9 interface.

This is only used in certain scenarios – the preference is generally to avoid unnecessary hops, so Intermediate-UPF usage is to be avoided where possible.

As this is a User Plane interface, it only transports user plane traffic.

When used this would be akin to the S5 interface in EPC.

N11 Interface

SMFs need to be told when to setup / tear down connections, this information comes from the AMF via the N11 interface.

As this is a Control Plane interface, it only transports control plane traffic.

This is similar to the S11 interface between the MME and the S-GW in EPC, however it would be more like the S11 if the S11 terminated on the P-GW.

Meta: 5G on this Blog

I’ve tried to steer well clear of the fever-pitched hype over 5G recently (I mean, this year I wrote 22 posts about building GSM networks with Osmocom to avoid talking about 5G), but the time has come to start sharing more about 5G.

But before we do…

My promise to you:
No talk of Augmented Reality, Holographic Video calls, 5G connected cities, sales buzzwords.

Just info for nerds about real (actually standardized) 5G.

As always I’ll reference specs where possible.

Let’s get building!

Link to all 5G tagged posts.