Category Archives: RFCs & Standards

Information overload on NBN FTTH

At long last, more and more Australians are going to have access to fibre based access to the NBN, and this seemed like as good an excuse than any to take a deep dive into how NBN’s GPON based fibre services are delivered to homes.

We’ve looked at NBN FTTN architecture, and NBN FTTC architecture and Skymuster Satellite architecture, so now let’s talk about how FTTH actually looks.

Let’s start in your local exchange where you’ll likely find a Nokia (Well, probably Alcatel-Lucent branded) 7210 SAS-R access aggregation switch, which is where NBN’s transmission network ends, and the access network begins.

It in turn spits out a 10 gig interface to feed the Optical Line Terminal (OLT), which provides the GPON services, each port on the OLT is split out and can feed 32 subscribers.

In NBN’s case, Nokia (Alcatel-Lucent) 7302, and rather than calling it an OLT, they call it a “FAN” or “Fibre Access Node” – Seemingly because they like the word node.

Each of the Nokia 7302s has at least one NGLT-A line card, which has 8 GPON ports. Each of the 8 ports on these cards can service 32 customers, and is fed by 2x 10Gbps uplinks to two 7210 SAS-R aggregation switches.

The chassis supports up to 16 cards, 8 ports each, 32 subs per port, giving us 4096 subscribers per FAN.

In some areas, FANs/OLTs aren’t located in an exchange but rather in a street cabinet, called a Temporary Fibre Access Node – Although it seems they’re very permanent.

To support Greenfields sites where a FAN site has not yet been established a cabinetised OLT solution is deployed, known as a Temporary FAN (TFAN).

In reality, each port on the OLT/FAN goes out Distribution Fibre Network or DFN which links the ports on the OLTs to a distribution cabinet in the street, known as as a Fibre Distribution Hub, or FDH.

If you look in FTTH areas, you’ll see the FDH cabinets.
The FDH is essentially a roadside optical distribution frame, used to cross connect cables from the Distribution Fibre Network (DFN) to the Local Fibre Network (LFN), and in a way, you can think of it as the GPON equivalent of a pillar, except this is where we have our optical splitters.

Remember when we were talking about the FAN/OLT how one port could serve 32 subscribers? We do that with a splitter, that takes one fibre from the DFN that runs to the FAN, and gives us 32 fibres we can could connect to an ONT onto to get service.

The FDH cabinets are made by Corning (OptiTect 576 fibre pad mounted cabinets) and you can see in the top right the Aqua cables go to the Distribution Fibre Network, and hanging below it on the right are the optical splitters themselves, which split the one fibre to the FAN into 32 fibres each on SC connectors.

These are then patched to the Local Fibre Network on the left hand side of the cabinet, where there’s up to 576 ports running across the suburb, and a “Parking” panel at the bottom where the unused ports from the splitter can be left until you patch the to the DFN ports above.

The FDH cabinets also offer “passthrough” allowing a fibre to from the FAN to be patched through to the DFN without passing through the GPON splitter, although I’m not clear if NBN uses this capability to deliver the NBN Business services.

But having each port in the FDH going to one home would be too simple; you’d have to bring 576 individually sheathed cables to the FDH and you’d lose too much flexibility in how the cable plant can be structured, so instead we’ve got a few more joints to go before we make it to your house.

From the FDH cabinet we go out into the Local Fibre Network, but NBN has two variants of LFN – LFN and Skinny LFN.
The traditional LFN uses high-density ribbon fibres, which offer a higher fibre count but is a bit tricker to splice/work with.
The Skinny LFN uses lower fibre count cables with stranded fibres, and is the current preferred option.

The original LFN cables are ribbon fibres and range from 72 to 288 fibre counts, but I believe 144 is the most common.

These LFN cables run down streets and close to homes, but not directly to lead in cables and customer houses.

These run to “Transition Closures” (Older NBN) or “Flexibility Joint Locations” (FJLs – Newer NBN)

While researching this I saw references to “Breakout Joint Locations” (BJLs) which are used in FTTC deployments, and are a Tenio B6 enclosure for 2x 12 Fibers and 4x 1 Fibers with a 1×4 splitter.

The FJLs are TE Systems’ (Now Commscope) Tenio range of fibre splice closures, and they’re use to splice the high fibre count cable from from the FDH cabinets into smaller 12 fibre count cables that run to multiple “Splitter Multi Ports” or “SMPs” in pits outside houses, and can contain splitters factory installed.

The splitters, referred to as “Multiports” or “SMPs” are Corning’s OptiSheath MultiPort Terminals, and they’re designed and laid out in such a way that the tech can activate a service, without needing to use a fusion splicer.

Due to the difficulty/cost in splicing fibre in pits for a service activation, NBNco opted to go from the FJL to the SMPs, where a field tech can just screw in a weatherproof fibre connector lead in to the customer’s premises.

During installation / activation callouts, the tech is assigned an SMP in the pit near the customer’s house, and a port on it. This in turn goes to the FJL and onto FDH cabinet as we just covered, but that patching/splicing for that is already done, so the tech doesn’t need to worry about that.

The tech just plugs in a pre-terminated lead in cable with a weatherproof fibre end, and screws it into the allocated port on the SMP, then hauls the other end of the lead in cable to the Premises Connection Device (Made by Madison or Tyco), located on the wall of the customer’s house.

The customer end of the lead in cable may be a pre terminated SC connector, or may get mechanically spliced onto a premade SC pigtail. In either case, they both terminate onto an SC male connector, which goes into an SC-SC female coupler inside the PCD.

Next is the customer’s internal wiring, again, preterm cable is used, to run between the PCD and the First Fibre Wall Outlet inside the house. This preterm cable join the lead in cable inside the PCD on the SC-SC female coupler, to join to the lead in.

Inside the house we have the “Network Termination Device” (NTD), which is a GPON ONT, is where the fibre from the street terminates and is turned into an Ethernet handoff to the customer. NBN has been through a few models of NTD, but the majority support 2x ATA ports for analog phones, and the option for an external battery backup unit to keep the device powered if mains power is lost.

Phew! That’s what I’ve been able to piece together from publicly available documentation, some of this may be out of date, and I can see there’s been several revisions to the LFN / DFN architectures over the years, if there’s anything I have incorrect here, please let me know!

Mobile IPv6 Tax?

Recently a Tweet from Dean Bubly got me thinking about how data is charged in cellular:

In the cellular world, subscribers are charged for data from the IP, transport and applications layers; this means you pay for the IP header, you pay for the TCP/UDP header, and you pay for the contents (the cat videos it contains).

This also means if an operator moves mobile subscribers from IPv4 to IPv6, there’s an extra 20 bytes the customer is charged for for every packet sent / received, which the customer is charged for – This is because the IPv6 header is longer than the IPv4 header.

Source: ServerFault - https://serverfault.com/questions/547768/ipv4-header-vs-ipv6-header-size

In most cases, mobile subs don’t get a choice as to if their connection is IPv4 or IPv6, but on a like for like basis, we can say that if a customer moves is on IPv6 every packet sent/received will have an extra 20 bytes of data consumed compared to IPv4.

This means subscribers use more data on IPv6, and this means they get charged for more data on IPv6.

For IoT applications, light users and PAYG users, this extra 20 bytes per packet could add up to something significant – But how much?

We can quantify this, but we’d need to know the number of packets sent on average, and the quantity of the data transferred, because the number of packets is the multiplier here.

So for starters I’ve left a phone on the desk, it’s registered to the network but just sitting in Idle mode – This is an engineering phone from an OEM, it’s just used for testing so doesn’t have anything loaded onto it in terms of apps, it’s not signed into any applications, or checking in the background, so I thought I’d try something more realistic.

So to get a clearer picture, I chucked a SIM in my regular everyday phone I use personally, registered it to the cellular lab I have here. For the next hour I sniffed the GTP traffic for the phone while it was sitting on my desk, not touching the phone, and here’s what I’ve got:

Overall the PCAP includes 6,417,732 bytes of data, but this includes the transport and GTP headers, meaning we can drop everything above it in our traffic calculations.

Everything except the data encapsulated in GTP can be dropped

For this I’ve got 14 bytes of ethernet, 20 bytes IP, 8 bytes UDP and 5 bytes for TZSP (this is to copy the traffic from the eNB to my local machine), then we’ve got the transport from the eNB to the SGW, 14 bytes of ethernet again, 20 bytes of IP , 8 bytes of UDP and 8 bytes of GTP then the payload itself. Phew.
All this means we can drop 97 bytes off every packet.

We have 16,889 packets, 6,417,732 bytes in total, minus 97 bytes from each gives us 1,638,233 of headers to drop (~1.6MB) giving us a total of 4.556 MB traffic to/from the phone itself.

This means my Android phone consumes 4.5 MB of cellular data in an hour while sitting on the desk, with 16,889 packets in/out.

Okay, now we’re getting somewhere!

So now we can answer the question, if each of these 16k packets was IPv6, rather than IPv4, we’d be adding another 20 bytes to each of them, 20 bytes x 16,889 packets gives 337,780 bytes (~0.3MB) to add to the total.

If this traffic was transferred via IPv6, rather than IPv4, we’d be looking at adding 20 bytes to each of the 16,889 packets, which would equate to 0.3MB extra, or about 7% overhead compared to IPv4.

But before you go on about what an outrage this IPv6 transport is, being charged for those extra bytes, that’s only one part of the picture.

There’s a reason operators are finally embracing IPv6, and it’s not to put an extra 7% of traffic on the network (I think if you asked most capacity planners, they’d say they want data savings, not growth).

IPv6 is, for lack of a better term, less rubbish than IPv4.

There’s a lot of drivers for IPv6, and some of these will reduce data consumption.
IPv6 is actually your stuff talking directly to the remote stuff, this means that we don’t need to rely on NAT, so no need to do NAT keepalives, and opening new sessions, which is going to save you data. If you’re running apps that need to keep a connection to somewhere alive, these data savings could negate your IPv6 overhead costs.

Will these potential data savings when using IPv6 outweigh the costs?

That’s going to depend on your use case.

If you’ve extremely bandwidth / data constrained, for example, you have an IoT device on an NTN / satellite connection, that was having to Push data every X hours via IPv4 because you couldn’t pull data from it as it had no public IP, then moving it to IPv6 so you can pull the data on the public IP, on demand, will save you data. That’s a win with IPv6.

If you’re a mobile user, watching YouTube, getting push notifications and using your phone like a normal human, probably not, but if you’re using data like a normal user, you’ve probably got a sizable data allowance that you don’t end up fully consuming, and the extra 20 bytes per packet will be nothing in comparison to the data used to watch a 2k video on your small phone screen.

DNS – TCP or UDP?

Ask someone with headphones and a lanyard in the halls of a datacenter what transport does DNS use, there’s a good chance the answer you’d get back is UDP Port 53.

But not always!

In scenarios where the DNS response is large (beyond 512 bytes) a DNS query will shift over to TCP for delivery.

How does the client know when to shift the request to TCP – After all, the DNS server knows how big the response is, but the client doesn’t.

The answer is the Truncated flag, in the response.

The DNS server sends back a response, but with the Truncated bit set, as per RFC 1035:

TC TrunCation – specifies that this message was truncated due to length greater than that permitted on the transmission channel.

RFC 1035

Here’s an example of the truncated bit being set in the DNS response.

The DNS client, upon receiving a response with the truncated bit set, should run the query again, this time using TCP for the transport.

One prime example of this is DNS NAPTR records used for DNS in roaming scenarios, where the response can quite often be quite large.

If it didn’t move these responses to TCP, you’d run the risk of MTU mismatches dropping DNS. In that half of my life has been spent debugging DNS issues, and the other half of my life debugging MTU issues, if I had MTU and DNS issues together, I’d be looking for a career change…

SMS Transport Wars?

There’s old joke about standards that the great thing about standards there’s so many to choose from.

SMS wasn’t there from the start of GSM, but within a year of the inception of 2G we had SMS, and we’ve had SMS, almost totally unchanged, ever since.

In a recent Twitter exchange, I was asked, what’s the best way to transport SMS?
As always the answer is “it depends” so let’s take a look together at where we’ve come from, where we are now, and how we should move forward.

How we got Here

Between 2G and 3G SMS didn’t change at all, but the introduction of 4G (LTE) caused a bit of a rethink regarding SMS transport.

Early builders of LTE (4G) networks launched their 4G offerings without 4G Voice support (VoLTE), with the idea that networks would “fall back” to using 2G/3G for voice calls.

This meant users got fast data, but to make or receive a call they relied on falling back to the circuit switched (2G/3G) network – Hence the name Circuit Switched Fallback.

Falling back to the 2G/3G network for a call was one thing, but some smart minds realised that if a phone had to fall back to a 2G/3G network every time a subscriber sent a text (not just calls) – And keep in mind this was ~2010 when SMS traffic was crazy high; then that would put a huge amount of strain on the 2G/3G layers as subs constantly flip-flopped between them.

To address this the SGs-AP interface was introduced, linking the 4G core (MME) with the 2G/3G core (MSC) to support this stage where you had 4G/LTE but only for data, SMS and calls still relied on the 2G/3G core (MSC).

The SGs-AP interface has two purposes;
One, It can tell a phone on 4G to fallback to 2G/3G when it’s got an incoming call, and two; it can send and receive SMS.

SMS traffic over this interface is sometimes described as SMS-over-NAS, as it’s transported over a signaling channel to the UE.

This also worked when roaming, as the MSC from the 2G/3G network was still used, so SMS delivery worked the same when roaming as if you were in the home 2G/3G network.

Enter VoLTE & IMS

Of course when VoLTE entered the scene, it also came with it’s own option for delivering SMS to users, using IP, rather than the NAS signaling. This removed the reliance on a link to a 2G/3G core (MSC) to make calls and send texts.

This was great because it allowed operators to build networks without any 2G/3G network elements and build a fully standalone LTE only network, like Jio, Rakuten, etc.

VoLTE didn’t change anything about the GSM 2G/3G SMS PDU, it just bundled it up in an SIP message body, this is often referred to as SMS-over-IP.

SMS-over-IP doesn’t address any of the limitations from 2G/3G, including limiting multipart messages to send payloads above 160 characters, and carries all the same limitations in order to be backward compatible, but it is over IP, and it doesn’t need 2G or 3G.

In roaming scenarios, S8 Home Routing for VoLTE enabled SMS to be handled when roaming the same way as voice calls, which made SMS roaming a doddle.

4G SMS: SMS over IP vs SMS over NAS

So if you’re operating a 4G network, should you deliver your SMS traffic using SMS-over-IP or SMS-over-NAS?

Generally, if you’ve been evolving your network over the years, you’ve got an MSC and a 2G/3G network, you still may do CSFB so you’ve probably ended up using SMS over NAS using the SGs-AP interface.
This method still relies on “the old ways” to work, which is fine until a discussion starts around sunsetting the 2G/3G networks, when you’d need to move calling to VoLTE, and SMS over NAS is a bit of a mess when it comes to roaming.

Greenfield operators generally opt for SMS over IP from the start, but this has its own limitations; SMS over IP is has awful efficiency which makes it unsuitable for use with NB-IoT applications which are bandwidth constrained, support for SMS over IP is generally limited to more expensive chipsets, so the bargain basement chips used for IoT often don’t support SMS over IP either, and integration of VoLTE comes with its own set of challenges regarding VoLTE enablement.

5G enters the scene (Nsmsf_SMService)

5G rolled onto the scene with the opportunity to remove the SMS over NAS option, and rely purely on SMS over IP (IMS); forcing the industry to standardise on an option alas this did not happen.

Instead 5GC introduces another delivery mechanism for SMS, just for 5GC without VoNR, the SMSf which can still send messages over the 5G NAS messaging.

This added another option for SMS delivery dependent on the access network used, and the Nsmsf_SMService interface does not support roaming.

Of course if you are using Voice over NR (VoNR) then like VoLTE, SMS is carried in a SIP message to the IMS, so this negates the need for the Nsmsf_SMService.

2G/3G Shutdown – Diameter to replace SGs-AP (SGd)

With the 2G/3G shutdown in the US operators who had up until this point been relying on SMS-over-NAS using the SGs-AP interface back to their MSCs were forced to make a decision on how to route SMS traffic, after the MSCs were shut down.

This landed with SMS-over-Diameter, where the 4G core (MME) communicates over Diameter with the SMSc.

The advantage of this approach is the Diameter protocol stack is the backbone of 4G roaming, and it’s not a stretch to get existing Diameter Routing Agents to start flicking SMS over Diameter messages between operators.

This has adoption by all the US operators, but we’re not seeing it so widely deployed in the rest of the world.

State of Play

OptionConditionsNotes
MAP2G/3G OnlyRelies on SS7 signaling and is very old
Supports roaming
SGs-AP (SMS-over-NAS)4G only relies on 2G/3GNeeds an MSC to be present in the network (generally because you have a 2G/3G network and have not deployed VoLTE)
Supports limited roaming
SMS over IP (IMS)4G / 5GNot supported on 2G/3G networks
Relies on a IMS enabled handset and network
Supports roaming in all S8 Home Routed scenarios
Device support limited, especially for IoT devices
Diameter SGd4G only / 5G NSAOnly works on 4G or 5G NSA
Better device support than 4G/5G
Supports roaming in some scenarios
Nsmsf_SMService5G standalone onlyOnly works on 5GC
Doesn’t support roaming
The convoluted world of SMS delivery options

A Way Forward:

While the SMS payload hasn’t changed in the past 31 years, how it is transported has opened up a lot of potential options for operators to use, with no clear winner, while SMS revenues and traffic volumes have continued to fall.

For better or worse, the industry needs to accept that SMS over NAS is an option to use when there is no IMS, and that in order to decommission 2G/3G networks, IMS needs to be embraced, and so SMS over IP (IMS) supported in all future networks, seems like the simple logical answer to move forward.

And with that clear path forward, we add in another wildcard…

Direct to device Satellite messes everything up…

Remember way back in this post when I said SMS over IP using IMS is a really really inefficient way of getting data? Well that hasn’t been a problem as we progressed up the generations of cellular tech as with each “G” we had more and more bandwidth than the last.

To throw a spanner in the works, let’s introduce NB-IoT and Non-Terrestrial Networks which rely on Non-IP-Data-Delivery.

These offer the ability to cover the globe with a low bandwidth / high latency service, that would ensure a subscriber is always just a message away, we’re seeing real world examples of these networks getting deployed for messaging applications already.

But, when you’ve only got a finite resource of bandwidth, and massive latencies to contend with, the all-IP architecture of IMS (VoLTE / VoNR) and it’s woeful inefficiency starts to really sting.

Of course there are potential workarounds here, Robust Header Correction (ROHC) can shrink this down, but it’s still going to rely on the 3 way handshake of TCP, TCP keepalive timers and IMS registrations, which in turn can starve the radio resources of the satellite link.

For NTN (Satelite) networks the case is being heavily made to rely on Non-IP-Data-Delivery, so the logical answer for these applications is to move the traffic back to SMS over NAS.

End Note

Even with SMS over 30 years old, we can still expect it to be a part of networks for years to come, even as WhatsApp / iMessage, etc, offer enhanced services. As to how it’s transported and the myriad of options here, I’m expecting that we’ll keep seeing a multi-transport mix long into the future.

For simple, cut-and-dried 4G/5G only network, IMS and SMS over IP makes the most sense, but for anything outside of that, you’ve got a toolbox of options for use to make a solution that best meets your needs.

Number Pads – Calculator or Phone?

If you’re typing on a full size keyboard there’s a good chance that to your right, there’s a number pad.

The number 5 is in the middle – That’s to be expected, but is 1 in the top left or bottom left?

Being derived from an adding machine keypad, the number pad on a keyboard has a 1 will be in the bottom left, however in the 1950s when telephone keypads were being introduced, only folks who worked in accounting had adding machines.

So when it came time to work out the best layout, the result we have today was a determined through a stack of research and testing by Human Factors Engineering Department of Bell Labs who studied the most efficient layout of keys, and tested focus groups to find the layout that provided the best level of speed and accuracy.

That landed with the 1 in the top left, and that’s what we still have today.

Oddly ATM and Card terminals opted to use the telephone layout, rather than the adding machine layout, while number pads use the adding machine layout.

A few exceptions to this exist, for example the Telecom ComputerPhone (Aka the Merlin Tonto in the UK, or the New Zealand Post Office Computerphone, or the ICL One Per Desk) which is the keyboard as envisioned by the telephone company.

Cisco ITP STP – Network Appearance

Short one,
The other day I needed to add a Network Appearance on an SS7/SS7 M3UA linkset.

Network Appearances on M3UA links are kinda like a port number, in that they allow you to distinguish traffic to the same point code, but handled by different logical entities.

When I added the NA parameter on the Linkset nothing happened.

If you’re facing the same you’ll need to set:

cs7 multi-instance

In the global config (this is the part I missed).

Then select the M3UA linkset you want to change and add the network-appearance parameter:

network-appearance 10

And bingo, you’ll start seeing it in your M3UA traffic:

BSF Addresses

The Binding Support Function is used in 4G and 5G networks to allow applications to authenticate against the network, it’s what we use to authenticate for XCAP and for an Entitlement Server.

Rather irritatingly, there are two BSF addresses in use:

If the ISIM is used for bootstrapping the FQDN to use is:

bsf.ims.mncXXX.mccYYY.pub.3gppnetwork.org

But if the USIM is used for bootstrapping the FQDN is

bsf.mncXXX.mccYYY.pub.3gppnetwork.org

You can override this by setting the 6FDA EF_GBANL (GBA NAF List) on the USIM or equivalent on the ISIM, however not all devices honour this from my testing.

Will 5GC be used in Wireline Access? No. Here’s why.

One of the hyped benefits of a 5G Core Networks is that 5GC can be used for wired networks (think DSL or GPON) – In marketing terms this is called “Wireless Wireline Convergence” (5G WWC) meaning DSL operators, cable operators and fibre network operators can all get in on this sweet 5GC action and use this sexy 5G Core Network tech.

This is something that’s in the standards, and that the big kit vendors are pushing heavily in their marketing materials. But will it take off? And should operators of wireline networks (fixed networks) be looking to embrace 5GC?

Comparing 5GC with current wireline network technologies isn’t comparing apples to apples, it’s apples to oranges, and they’re different fruits.

At its heart, the 3GPP Core Networks (including 5G Core) address one particular use cases of the cellular industry: Subscriber mobility – Allowing a customer to move around the network, being served by different kit (gNodeBs) while keeping the same IP Address.

The most important function of 5GC is subscriber mobility.

This is achieved through the use of encapsulating all the subscriber’s IP data into a GTP (A protocol that’s been around since 2G first added data).

Do I need a 5GC for my Fixed Network?

Wireline networks are fixed. Subscribers don’t constantly move around the network. A GPON customer doesn’t need to move their OLT every 30 minutes to a new location.

Encapsulating a fixed subscriber’s traffic in GTP adds significant processing overhead, for almost no gain – The needs of a wireline network operator, are vastly different to the needs of a cellular core.

Today, you can take a /24 IPv4 block, route it to a DSLAM, OLT or CMTS, and give an IP to 254 customers – No cellular core needed, just a router and your access device and you’re done, and this has been possible for decades.
Because there’s no mobility the GTP encapsulation that is the bedrock for cellular, is not needed.

Rather than routing directly to Access Network kit, most fixed operators deploy BRAS systems used for fixed access. Like the cellular packet core, BRAS has been around for a very long time, with a massive install base and a sea of engineering experience in house, it meets the needs of the wireline industry who define its functions and roles along with kit vendors of wireline kit; the fixed industry working groups defined the BRAS in the same way the 3GPP and cellular industry working groups defined 5G Core.

I don’t forsee that we’ll see large scale replacement of BRAS by 5GC, for the same reason a wireless operator won’t replace their mobile core with a BRAS and PPPoE – They’re designed to meet different needs.

All the other features that have been added to the 3GPP Core Network functionality, like limiting speed, guaranteed throughput bearers, 5QI / QCI values, etc, are addons – nice-to-haves. All of these capabilities could be implemented in wireline networks today – if the business case and customer demand was there.

But what about slicing?

With dropping ARPUs across the board, additional services relating to QoS (“Network Slicing”) are being held up as the saving grace of revenues for cellular operators and 5G as a whole, however this has yet to be realized and early indications suggest this is not going to be anywhere near as lucrative as previously hoped.

What about cost savings?

In terms of cost-per-bit of throughput, the existing install base wireline operators have of heavy-metal kit capable of terabit switching and routing has been around for some time in fixed world, and is what most 5G Cores will connect to as their upstream anyway, so there won’t be any significant savings on equipment, power consumption or footprint to be gained.

Fixed networks transport the majority of the world’s data today – Wireline access still accounts for the majority of traffic volumes, so wireline kit handles a higher magnitude of throughput than it’s Packet Core / 5GC cousins already.

Cutting down the number of parts in the network is good though right?

If you’re operating both a Packet Core for Cellular, and a fixed network today, then you might think if you moved from the traditional BRAS architecture fore the wired network to 5GC, you could drop all those pesky routers and switches clogging up your CO, Exchanges and Data Centers.

The problem is that you still need all of those after the 5GC to be able to get the traffic anywhere users want to go. So the 5GC will still need all of that kit, all your border routers and peering routers will remain unchanged, as well as domestic transmission, MPLS and transport.

The parts required for operating fixed networks is actually pretty darn small in comparison to that of 5GC.

TL;DR?

While cellular vendors would love to sell their 5GC platform into fixed operators, the premise that they are willing to replace existing BRAS architectures with 5GC, is as unlikely in my view as 5GC being replaced by BRAS.

IMS iFC – SPT Session Cases

Mostly just reference material for me:

Possible values:

  • 0 (ORIGINATING_SESSION)
  • 1 TERMINATING_REGISTERED
  • 2 (TERMINATING_UNREGISTERED)
  • 3 (ORIGINATING_UNREGISTERED

In the past I had my iFCs setup to look for the P-Access-Network-Info header to know if the call was coming from the IMS, but it wasn’t foolproof – Fixed line IMS subs didn’t have this header.

            <TriggerPoint>
                <ConditionTypeCNF>1</ConditionTypeCNF>
                <SPT>
                    <ConditionNegated>0</ConditionNegated>
                    <Group>0</Group>
                    <Method>INVITE</Method>
                    <Extension></Extension>
                </SPT>
                <SPT>
                    <ConditionNegated>0</ConditionNegated>
                    <Group>1</Group>
                    <SIPHeader>
                      <Header>P-Access-Network-Info</Header>
                    </SIPHeader>
                </SPT>                
            </TriggerPoint>

But now I’m using the Session Cases to know if the call is coming from a registered IMS user:

        <!-- SIP INVITE Traffic from Registered Sub-->
        <InitialFilterCriteria>
            <Priority>30</Priority>
            <TriggerPoint>
                <ConditionTypeCNF>1</ConditionTypeCNF>
                <SPT>
                    <ConditionNegated>0</ConditionNegated>
                    <Group>0</Group>
                    <Method>INVITE</Method>
                    <Extension></Extension>
                </SPT>
                <SPT>
                    <Group>0</Group>
                    <SessionCase>0</SessionCase>
                </SPT>             
            </TriggerPoint>

SQN Sync in IMS Auth

So the issue was a head scratcher.

Everything was working on the IMS, then I go to bed, the next morning I fire up the test device and it just won’t authenticate to the IMS – The S-CSCF generated a 401 in response to the REGISTER, but the next REGISTER wouldn’t pass.

Wireshark just shows me this loop:

UE -> IMS: REGISTER
IMS -> UE: 401 Unauthorized (With Challenge)
UE -> IMS: REGISTER with response
IMS -> UE: 401 Unauthorized (With Challenge)
UE -> IMS: REGISTER with response
IMS -> UE: 401 Unauthorized (With Challenge)
UE -> IMS: REGISTER with response
IMS -> UE: 401 Unauthorized (With Challenge)

So what’s going on here?

IMS uses AKAv1-MD5 for Authentication, this is slightly different to the standard AKA auth used in cellular, but if you’re curious, we’ve covered by IMS Authentication and standard AKA based SIM Authentication in cellular networks before.

When we generate the vectors (for IMS auth and standard auth) one of the inputs to generate the vectors is the Sequence Number or SQN.

This SQN ticks over like an odometer for the number of times the SIM / HSS authentication process has been performed.

There is some leeway in the SQN – It may not always match between the SIM and the HSS and that’s to be expected.
When the MME sends an Authentication-Information-Request it can ask for multiple vectors so it’s got some in reserve for the next time the subscriber attaches, and that’s allowed.

Information stored on USIM / SIM Card for LTE / EUTRAN / EPC - K key, OP/OPc key and SQN Sequence Number

But there are limits to how far out our SQN can be, and for good reason – One of the key purposes for the SQN is to protect against replay attacks, where the same vector is replayed to the UE. So the SQN on the HSS can be ahead of the SIM (within reason), but it can’t be behind – Odometers don’t go backwards.

So the issue was with the SQN on the SIM being out of Sync with the SQN in the IMS, how do we know this is the case, and how do we fix this?

Well there is a resync mechanism so the SIM can securely tell the HSS what the current SQN it is using, so the HSS can update it’s SQN.

When verifying the AUTN, the client may detect that the sequence numbers between the client and the server have fallen out of sync.
In this case, the client produces a synchronization parameter AUTS, using the shared secret K and the client sequence number SQN.
The AUTS parameter is delivered to the network in the authentication response, and the authentication can be tried again based on authentication vectors generated with the synchronized sequence number.

RFC 3110: HTTP Digest Authentication using AKA

In our example we can tell the sub is out of sync as in our Multimedia Authentication Request we see the SIP-Authorization AVP, which contains the AUTS (client synchronization parameter) which the SIM generated and the UE sent back to the S-CSCF. Our HSS can use the AUTS value to determine the correct SQN.

SIP-Authorization AVP in the Multimedia Authentication Request means the SQN is out of Sync and this AVP contains the RAND and AUTN required to Resync

Note: The SIP-Authorization AVP actually contains both the RAND and the AUTN concatenated together, so in the above example the first 32 bytes are the AUTN value, and the last 32 bytes are the RAND value.

So the HSS gets the AUTS and from it is able to calculate the correct SQN to use.

Then the HSS just generates a new Multimedia Authentication Answer with a new vector using the correct SQN, sends it back to the IMS and presto, the UE can respond to the challenge normally.

This feature is now fully implemented in PyHSS for anyone wanting to have a play with it and see how it all works.

And that friends, is how we do SQN resync in IMS!

Getting to know the PCRF for traffic Policy, Rules & Rating

Misunderstood, under appreciated and more capable than people give it credit for, is our PCRF.

But what does it do?

Most folks describe the PCRF in hand wavy-terms – “it does policy and charging” is the answer you’ll get, but that doesn’t really tell you anything.

So let’s answer it in a way that hopefully makes some practical sense, starting with the acronym “PCRF” itself, it stands for Policy and Charging Rules Function, which is kind of two functions, one for policy and one for rules, so let’s take a look at both.

Policy

In cellular world, as in law, policy is the rules.

For us some examples of policy could be a “fair use policy” to limit customer usage to acceptable levels, but it can also be promotional packages, services like “free Spotify” packages, “Voice call priority” or “unmetered access to Nick’s Blog and maximum priority” packages, can be offered to customers.

All of these are examples of policy, and to make them work we need to target which subscribers and traffic we want to apply the policy to, and then apply the policy.

Charging Rules

Charging Rules are where the policy actually gets applied and the magic happens.

It’s where we take our policy and turn it into actionable stuff for the cellular world.

Let’s take an example of “unmetered access to Nick’s Blog and maximum priority” as something we want to offer in all our cellular plans, to provide access that doesn’t come out of your regular usage, as well as provide QCI 5 (Highest non dedicated QoS) to this traffic.

To achieve this we need to do 3 things:

  • Profile the traffic going to this website (so we capture this traffic and not regular other internet traffic)
  • Charge it differently – So it’s not coming from the subscriber’s regular balance
  • Up the QoS (QCI) on this traffic to ensure it’s high priority compared to the other traffic on the network

So how do we do that?

Profiling Traffic

So the first step we need to take in providing free access to this website is to filter out traffic to this website, from the traffic not going to this website.

Let’s imagine that this website is hosted on a single machine with the IP 1.2.3.4, and it serves traffic on TCP port 443. This is where IPFilterRules (aka TFTs or “Traffic Flow Templates”) and the Flow-Description AVP come into play. We’ve covered this in the past here, but let’s recap:

IPFilterRules are defined in the Diameter Base Protocol (IETF RFC 6733), where we can learn the basics of encoding them,

They take the format:

action dir proto from src to dst

The action is fairly simple, for all our Dedicated Bearer needs, and the Flow-Description AVP, the action is going to be permit. We’re not blocking here.

The direction (dir) in our case is either in or out, from the perspective of the UE.

Next up is the protocol number (proto), as defined by IANA, but chances are you’ll be using 17 (UDP) or 6 (TCP).

The from value is followed by an IP address with an optional subnet mask in CIDR format, for example from 10.45.0.0/16 would match everything in the 10.45.0.0/16 network.

Following from you can also specify the port you want the rule to apply to, or, a range of ports.

Like the from, the to is encoded in the same way, with either a single IP, or a subnet, and optional ports specified.

And that’s it!

So let’s create a rule that matches all traffic to our website hosted on 1.2.3.4 TCP port 443,

permit out 6 from 1.2.3.4 443 to any 1-65535
permit out 6 from any 1-65535 to 1.2.3.4 443

All this info gets put into the Flow-Information AVPs:

With the above, any traffic going to/from 1.23.4 on port 443, will match this rule (unless there’s another rule with a higher precedence value).

Charging Actions

So with our traffic profiled, the next question is what actions are we going to take, well there’s two, we’re going to provide unmetered access to the profiled traffic, and we’re going to use QCI 4 for the traffic (because you’ll need a guaranteed bit rate bearer to access!).

Charging-Group for Profiled Traffic

To allow for Zero Rating for traffic matching this rule, we’ll need to use a different Rating Group.

Let’s imagine our default rating group for data is 10000, then any normal traffic going to the OCS will use rating group 10000, and the OCS will apply the specific rates and policies based on that.

Rating Groups are defined in the OCS, and dictate what rates get applied to what Rating Groups.

For us, our default rating group will be charged at the normal rates, but we can define a rating group value of 4000, and set the OCS to provide unlimited traffic to any Credit-Control-Requests that come in with Rating Group 4000.

This is how operators provide services like “Unlimited Facebook” for example, a Charging Rule matches the traffic to Facebook based on TFTs, and then the Rating Group is set differently to the default rating group, and the OCS just allows all traffic on that rating group, regardless of how much is consumed.

Inside our Charging-Rule-Definition, we populate the Rating-Group AVP to define what Rating Group we’re going to use.

Setting QoS for Profiled Traffic

The QoS Description AVP defines which QoS parameters (QCI / ARP / Guaranteed & Maximum Bandwidth) should be applied to the traffic that matches the rules we just defined.

As mentioned at the start, we’ll use QCI 4 for this traffic, and allocate MBR/GBR values for this traffic.

Putting it Together – The Charging Rule

So with our TFTs defined to match the traffic, our Rating Group to charge the traffic and our QoS to apply to the traffic, we’re ready to put the whole thing together.

So here it is, our “Free NVN” rule:

I’ve attached a PCAP of the flow to this post.

In our next post we’ll talk about how the PGW handles the installation of this rule.

SMS-over-IP Message Efficiency – K

Recently I read a post from someone talking about efficiency of USSD over IMS, and how crazy it was that such a small amount of data used so much overhead to get transferred across the network.

Having built an SMSc a while ago, my mind immediately jumped to SMS over IMS as being a great example of having so much overhead.

If we’re to consider sending the response “K” to a text message, how much overhead is there?

SMS PDU containing the message “K”

I’m using a common Qualcomm based smartphone, and here’s the numbers I’ve got from Wireshark when I send the message:

Transport Ethernet Header – 14 Bytes
Transport IP Header – 20 Bytes
Transport UDP Header – 8 Bytes
Transport GTP Header – 12 Bytes
User IP Header – 20 Bytes
IPsec ESP Header (For Um interface protection) – 22 Bytes
Encapsulated UDP Header – 8 Bytes
SIP Headers – 707 Bytes
SMS Header – 16 Bytes
SMS Message Body “K” – 1 Byte

Overall SIP, ESP, GTP and Transport PCAP for SIP MESSAGE

That seems pretty bad in terms of efficiency, but let’s look at how that actually works out:

This means our actual message body makes up just 1 byte of 828 bytes, or 0.12% of the size of the overall payload.

Even combined with the SMS header (which contains all the addressing information needed to route an SMS) it’s still just on 2% of the overall message.

So USSD efficiency isn’t great, but it’s not alone!

Kamailio Diameter Routing Agent Support

Recently I’ve been working on open source Diameter Routing Agent implementations (See my posts on FreeDiameter).

With the hurdles to getting a DRA working with open source software covered, the next step was to get all my Diameter traffic routed via the DRAs, however I soon rediscovered a Kamailio limitation regarding support for Diameter Routing Agents.

You see, when Kamailio’s C Diameter Peer module makes a decision as to where to route a request, it looks for the active Diameter peers, and finds a peer with the suitable Vendor and Application IDs in the supported Applications for the Application needed.

Unfortunately, a DRA typically only advertises support for one application – Relay.

This means if you have everything connected via a DRA, Kamailio’s CDP module doesn’t see the Application / Vendor ID for the Diameter application on the DRA, and doesn’t route the traffic to the DRA.

The fix for this was twofold, the first step was to add some logic into Kamailio to determine if the Relay application was advertised in the Capabilities Exchange Request / Answer of the Diameter Peer.

I added the logic to do this and exposed this so you can see if the peer supports Diameter relay when you run “cdp.list_peers”.

With that out of the way, next step was to update the routing logic to not just reject the candidate peer if the Application / Vendor ID for the required application was missing, but to evaluate if the peer supports Diameter Relay, and if it does, keep it in the game.

I added this functionality, and now I’m able to use CDP Peers in Kamailio to allow my P-CSCF, S-CSCF and I-CSCF to route their traffic via a Diameter Routing Agent.

I’ve got a branch with the changes here and will submit a PR to get it hopefully merged into mainline soon.

NB-IoT NIDD Basics

NB-IoT introduces support for NIDD – Non-IP Data Delivery (NIDD) which is one of the cool features of NB-IoT that’s gaining more widespread adoption.

Let’s take a deep dive into NIDD.

The case against IP for IoT

In the over 40 years since IP was standardized, we’ve shoehorned many things onto IP, but IP was never designed or optimized for low power, low throughput applications.

For the battery life of an IoT device to be measured in years, it has to be very selective about what power hungry operations it does. Transmitting data over the air is one of the most power-intensive operations an IoT device can perform, so we need to do everything we can to limit how much data is sent, and how frequently.

Use Case – NB-IoT Tap

Let’s imagine we’re launching an IoT tap that transmits information about water used, as part of our revolutionary new “Water as a Service” model (WaaS) which removes the capex for residents building their own water treatment plant in their homes, and instead allows dynamic scaling of waterloads as they move to our new opex model.

If I turn on the tap and use 12L of water, when I turn off the tap, our IoT tap encodes the usage onto a single byte and sends the usage information to our rain-cloud service provider.

So we’re not constantly changing the batteries in our taps, we need to send this one byte of data as efficiently as possible, so as to maximize the battery life.

If we were to transport our data on TCP, well we’d need a 3 way handshake and several messages just to transmit the data we want to send.

Let’s see how our one byte of data would look if we transported it on TCP.

That sliver of blue in the diagram is our usage component, the rest is overhead used to get it there. Seems wasteful huh?

Sure, TCP isn’t great for this you say, you should use UDP! But even if we moved away from TCP to UDP, we’ve still got the IPv4 header and the UDP header wasting 28 bytes.

For efficiency’s sake (To keep our batteries lasting as long as possible) we want to send as few messages as possible, and where we do have to send messages, keep them very short, so IP is not a great fit here.

Enter NIDD – Non-IP Data Delivery.

Through NIDD we can just send the single hex byte, only be charged for the single hex byte, and only stay transmitting long enough to send this single byte of hex (Plus the NBIoT overheads / headers).

Compared to UDP transport, NIDD provides us a reduction of 28 bytes of overhead for each message, or a 96% reduction in message size, which translates to real power savings for our IoT device.

In summary – the more sending your device has to do, the more battery it consumes.
So in a scenario where you’re trying to maximize power efficiency to keep your batter powered device running as long as possible, needing to transmit 28 bytes of wasted data to transport 1 byte of usable data, is a real waste.

Delivering the Payload

NIDD traffic is transported as raw hex data end to end, this means for our 1 byte of water usage data, the device would just send the hex value to be transferred and it’d pop out the other end.

To support this we introduce a new network element called the SCEFService Capability Exposure Function.

From a developer’s perspective, the SCEF is the gateway to our IoT devices. Through the RESTful API on the SCEF (T8 API), we can send and receive raw hex data to any of our IoT devices.

When one of our Water-as-a-Service Taps sends usage data as a hex byte, it’s the software talking on the T8 API to the SCEF that receives this data.

Data of course needs to be addressed, so we know where it’s coming from / going to, and T8 handles this, as well as message reliability, etc, etc.

This is a telco blog, so we should probably cover the MME connection, the MME talks via Diameter to the SCEF. In our next post we’ll go into these signaling flows in more detail.

If you’re wondering what the status of Open Source SCEF implementations are, then you may have already guessed I’m working on one!

Hopefully by now you’ve got a bit of an idea of how NIDD works in NB-IoT, and in our next posts we’ll dig deeper into the flows and look at some PCAPs together.

Diameter Routing Agents – Part 5 – AVP Transformations

Having a central pair of Diameter routing agents allows us to drastically simplify our network, but what if we want to perform some translations on AVPs?

For starters, what is an AVP transformation? Well it’s simply rewriting the value of an AVP as the Diameter Request/Response passes through the DRA. A request may come into the DRA with IMSI xxxxxx and leave with IMSI yyyyyy if a translation is applied.

So why would we want to do this?

Well, what if we purchased another operator who used Realm X, and we use Realm Y, and we want to link the two networks, then we’d need to rewrite Realm Y to Realm X, and Realm X to Realm Y when they communicate, AVP transformations allow for this.

If we’re an MVNO with hosted IMSIs from an MNO, but want to keep just the one IMSI in our HSS/OCS, we can translate from the MNO hosted IMSI to our internal IMSI, using AVP transformations.

If our OCS supports only one rating group, and we want to rewrite all rating groups to that one value, AVP transformations cover this too.

There are lots of uses for this, and if you’ve worked with a bit of signaling before you’ll know that quite often these sorts of use-cases come up.

So how do we do this with freeDiameter?

To handle this I developed a module for passing each AVP to a Python function, which can then apply any transformation to a text based value, using every tool available to you in Python.

In the next post I’ll introduce rt_pyform and how we can use it with Python to translate Diameter AVPs.

Diameter Routing Agents (Why you need them, and how to build them) – Part 2 – Routing

What I typically refer to as Diameter interfaces / reference points, such as S6a, Sh, Sx, Sy, Gx, Gy, Zh, etc, etc, are also known as Applications.

Diameter Application Support

If you look inside the Capabilities Exchange Request / Answer dialog, what you’ll see is each side advertising the Applications (interfaces) that they support, each one being identified by an Application ID.

CER showing support for the 3GPP Zh Application-ID (Interface)

If two peers share a common Application-Id, then they can communicate using that Application / Interface.

For example, the above screenshot shows a peer with support for the Zh Interface (Spoiler alert, XCAP Gateway / BSF coming soon!). If two Diameter peers both have support for the Zh interface, then they can use that to send requests / responses to each other.

This is the basis of Diameter Routing.

Diameter Routing Tables

Like any router, our DRA needs to have logic to select which peer to route each message to.

For each Diameter connection to our DRA, it will build up a Diameter Routing table, with information on each peer, including the realm and applications it advertises support for.

Then, based on the logic defined in the DRA to select which Diameter peer to route each request to.

In its simplest form, Diameter routing is based on a few things:

  1. Look at the DestinationRealm, and see if we have any peers at that realm
  2. If we do then look at the DestinationHost, if that’s set, and the host is connected, and if it supports the specified Application-Id, then route it to that host
  3. If no DestinationHost is specified, look at the peers we have available and find the one that supports the specified Application-Id, then route it to that host
Simplified Diameter Routing Table used by DRAs

With this in mind, we can go back to looking at how our DRA may route a request from a connected MME towards an HSS.

Let’s look at some examples of this at play.

The request from MME02 is for DestinationRealm mnc001.mcc001.3gppnetwork.org, which our DRA knows it has 4 connected peers in (3 if we exclude the source of the request, as we don’t want to route it back to itself of course).

So we have 3 contenders still for who could get the request, but wait! We have a DestinationHost specified, so the DRA confirms the host is available, and that it supports the requested ApplicationId and routes it to HSS02.

So just because we are going through a DRA does not mean we can’t specific which destination host we need, just like we would if we had a direct link between each Diameter peer.

Conversely, if we sent another S6a request from MME01 but with no DestinationHost set, let’s see how that would look.

Again, the request is from MME02 is for DestinationRealm mnc001.mcc001.3gppnetwork.org, which our DRA knows it has 3 other peers it could route this to. But only two of those peers support the S6a Application, so the request would be split between the two peers evenly.

Clever Routing with DRAs

So with our DRA in place we can simplify the network, we don’t need to build peer links between every Diameter device to every other, but let’s look at some other ways DRAs can help us.

Load Control

We may want to always send requests to HSS01 and only use HSS02 if HSS01 is not available, we can do this with a DRA.

Or we may want to split load 75% on one HSS and 25% on the other.

Both are great use cases for a DRA.

Routing based on Username

We may want to route requests in the DRA based on other factors, such as the IMSI.

Our IMSIs may start with 001010001xxx, but if we introduced an MVNO with IMSIs starting with 001010002xxx, we’d need to know to route all traffic where the IMSI belongs to the home network to the home network HSS, and all the MVNO IMSI traffic to the MVNO’s HSS, and DRAs handle this.

Inter-Realm Routing

One of the main use cases you’ll see for DRAs is in Roaming scenarios.

For example, if we have a roaming agreement with a subscriber who’s IMSIs start with 90170, we can route all the traffic for their subs towards their HSS.

But wait, their Realm will be mnc901.mcc070.3gppnetwork.org, so in that scenario we’ll need to add a rule to route the request to a different realm.

DRAs handle this also.

In our next post we’ll start actually setting up a DRA with a default route table, and then look at some more advanced options for Diameter routing like we’ve just discussed.

One slight caveat, is that mutual support does not always mean what you may expect.
For example an MME and an HSS both support S6a, which is identified by Auth-Application-Id 16777251 (Vendor ID 10415), but one is a client and one is a server.
Keep this in mind!

Diameter Routing Agents (Why you need them, and how to build them) – Part 1

Answer Question 1: Because they make things simpler and more flexible for your Diameter traffic.
Answer Question 2: With free software of course!

All about DRAs

But let’s dive a little deeper. Let’s look at the connection between an MME and an HSS (the S6a interface).

Direct Diameter link between two Diameter Peers

We configure the Diameter peers on MME1 and HSS01 so they know about each other and how to communicate, the link comes up and presto, away we go.

But we’re building networks here! N+1 redundancy and all that, so now we have two HSSes and two MMEs.

Direct Diameter link between 4 Diameter peers

Okay, bit messy, but that’s okay…

But then our network grows to 10 MMEs, and 3 HSSes and you can probably see where this is going, but let’s drive the point home.

Direct Diameter connections for a network with 10x MME and 3x HSS

Now imagine once you’ve set all this up you need to do some maintenance work on HSS03, so need to shut down the Diameter peer on 10 different MMEs in order to isolate it and deisolate it.

The problem here is pretty evident, all those links are messy, cumbersome and they just don’t scale.

If you’re someone with a bit of networking experience (and let’s face it, you’re here after all), then you’re probably thinking “What if we just had a central system to route all the Diameter messages?”

An Agent that could Route Diameter, a Diameter Routing Agent perhaps…

By introducing a DRA we build Diameter peer links between each of our Diameter devices (MME / HSS, etc) and the DRA, rather than directly between each peer.

Then from the DRA we can route Diameter requests and responses between them.

Let’s go back to our 10x MME and 3x HSS network and see how it looks with a DRA instead.

So much cleaner!

Not only does this look better, but it makes our life operating the network a whole lot easier.

Each MME sends their S6a traffic to the DRA, which finds a healthy HSS from the 3 and sends the requests to it, and relays the responses as well.

We can do clever load balancing now as well.

Plus if a peer goes down, the DRA detects the failure and just routes to one of the others.

If we were to introduce a new HSS, we wouldn’t need to configure anything on the MMEs, just add HSS04 to the DRA and it’ll start getting traffic.

Plus from an operations standpoint, now if we want to to take an HSS offline for maintenance, we just shut down the link on the HSS and all HSS traffic will get routed to the other two HSS instances.

In our next post we’ll talk about the Routing part of the DRA, how the decisions are made and all the nuances, and then in the following post we’ll actually build a DRA and start routing some traffic around!

MMS Deep Dive – MM1 – Mobile Terminated MMS

In our last post we talked about sending an Multimedia Message, and in this post, we’re going to cover the process of receiving a Multimedia Message.

Carl Sagan once famously said “If you wish to make an apple pie from scratch, you must first invent the universe”, we don’t need to go that far back, but if you want to deliver an MMS to a subscriber, first you must deliver an SMS.

Wait, but we’re talking about MMS right? So why are we talking SMS?

Modern MMS transport relies on HTTP, which is client-server based, the phone / UE is the client, and the MMSc is the Server.

The problem with this client-server relationship, is the client requests things from the server, but the server can’t request things from the client.

This presents a problem when it comes to delivering the MMS – The phone / UE will need to request the MMSc provide it the message to be received, but needs to know there is a message to request in the first place.

So this is where SMS comes in. When the MMSc has a message destined for a Subscriber, it sends the phone/UE an SMS, informing that there is an MMS waiting, and providing the URL the MMS can be retrieved from.

This is typically done by MAP or SMPP, to link the MMSc to the SMSc to allow it to send these messages.

This SMS contains the URL to retrieve the MMS at, once the UE receives this SMS, it knows where to retrieve the MMS.

It can then send an HTTP GET to the URL to retrieve the MMS, and lastly sends an HTTP POST to confirm to the MMSc it retrieved it all OK.

MMS Mobile Terminated message flow

So that’s the basics, let’s look at each part of the dialog in some more detail, starting with this magic SMS to tell the UE where to retrieve the MMS from.

WAP PUSH from MMSc sent via SMS

So some things to notice, the user data, which would usually carry the body of our SMS instead contains another protocol, “Wireless Session Protocol” (WSP), and this is the method “Push”.

That in turn is followed by MMS Message Encapsulation, again inside the SMS message body, this time with the MMS specific data.

The From: header contains the sender of the MMS, this is how you can see who the MMS is from, while it’s still downloading.

The expiry indicates to the handset, it it doesn’t download the MMS within the specified time period, it shouldn’t bother, as the message will have expired.

And lastly, and perhaps most importantly, we have the X-MMS-Content-Location header, which tells our subscriber where to download the MMS from.

After this, the UE sends an HTTP GET to the URL in the X-MMS-Content-Location header (typically on the “mms” APN), to retrieve the MMS from the MMSc.

HTTP GET from the UE to the MMSc

The HTTP GET is pretty normal, there’s the usual MMS headers we talked about in the last post, and we just GET the path provided by the MMSc in the WAP PUSH.

The response from the MMSc contains the actual MMS itself, which is almost a mirror of the sending process (the Data component is unchanged from when the sender sent it).

Response to HTTP GET for message retrieval

At this stage our subscriber has retrieve the MMS, but may not have retrieved it fully, or may have had an issue retrieving it.

Instead the UE sends an HTTP POST with the MMS-Message-Type m-notifyresp-ind with the transaction ID, to indicate that it has successfully retrieved the MMS, and at this point the MMS can notify the sender if delivery receipts are enabled, and delete the message from the cache.

And finally the MMSc sends back a 200 OK with no body to confirm it got that too.

Some notes on MMS Security

Reading about unauthenticated GET requests, you may be left wondering what security does MMS have, and what stops you from just going through and sending HTTP GET requests to all the possible URL paths to vacuum up all the MMS?

In the standard, nothing!

Typically the MMSc has some layer of security added by the implementer, to ensure the user retrieving the MMS, is the user the MMS is destined for.
Because MMS has no security in the standard, this is typically achieved through Header Enrichment, whereby the P-GW adds a HTTP header with the MSISDN or IMSI of the subscriber, and then the MMSc can evaluate if this subscriber should be able to retrieve that URL.

Another attack vector I played with was sending a SMS based MMS-Notify with a different URL, which if retrieved, would leak the subscriber’s IP, as it would cause the UE to try and get data from that URL.

Kamailio I-CSCF – SRV Lookup Behaviour

Recently I had a strange issue I thought I’d share.

Using Kamailio as an Interrogating-CSCF, Kamailio was getting the S-CSCF details from the User-Authorization-Answer’s “Server-Name” (602) AVP.

The value was set to:

sip:scscf.mnc001.mcc001.3gppnetwork.org:5060

But the I-CSCF was only looking up A-Records for scscf.mnc001.mcc001.3gppnetwork.org, not using DNS-SRV.

The problem? The Server-Name I had configured as a full SIP URI in PyHSS including the port, meant that Kamailio only looks up the A-Record, and did not do a DNS-SRV lookup for the domain.

Dropping the port number saw all those delicious SRV records being queried.

Something to keep in mind if you use S-CSCF pooling with a Kamailio based I-CSCF, if you want to use SRV records for load balancing / traffic sharing, don’t include the port, and if instead you want it to go to the specified host found by an A-record, include the port.