Category Archives: Software

Ubuntu Cloned VMs getting Duplicate IPs (and yes – the MAC Addresses are unique)

So I run a lot of VMs. It’s not unusual when I’m automating something with Ansible or setting up a complex lab to be running 20+ VMs at a time, and often I’ll create a base VM and clone it a dozen times.

Alas, Ubuntu 20.04 has some very irritating default behaviour, where even if the MAC addresses of these cloned VMs differ they get the same IP Address from DHCP.

That’s because by default Netplan doesn’t use the MAC address as the identifier when requesting a DHCP lease. And if you’ve cloned a VM the identifier it does use doesn’t change even if you do change the MAC address…

Irritating, but easily fixed!

Editing the netplan config:

network:
  ethernets:
    eth0:
      dhcp4: true
      dhcp-identifier: mac
  version: 2

Run a netplan-apply and you’re done.

Now you can clone that VM as many times as you like and each will get it’s own unique IP address.

Demystifying SS7 & Sigtran – Part 4 – Routing with Point Codes

This is part of a series of posts looking into SS7 and Sigtran networks. We cover some basic theory and then get into the weeds with GNS3 based labs where we will build real SS7/Sigtran based networks and use them to carry traffic.

Having a direct Linkset from every Point Code to every other Point Code in an SS7 network isn’t practical, we need to rely on routing, so in this post we’ll cover routing between Point Codes on our STPs.

Let’s start in the IP world, imagine a router with a routing table that looks something like this:

Simple IP Routing Table
192.168.0.0/24 out 192.168.0.1 (Directly Attached)
172.16.8.0/22 via 192.168.0.3 - Static Route - (Priority 100)
172.16.0.0/16 via 192.168.0.2 - Static Route - (Priority 50)
10.98.22.1/32 via 192.168.0.3 - Static Route - (Priority 50)

We have an implicit route for the network we’re directly attached to (192.168.0.0/24), and then a series of static routes we configure.
We’ve also got two routes to the 172.16.8.0/22 subnet, one is more specific with a higher priority (172.16.8.0/22 – Priority 100), while the other is less specific with a lower priority (172.16.0.0/16 – Priority 50). The higher priority route will take precedence.

This should look pretty familiar to you, but now we’re going to take a look at routing in SS7, and for that we’re going to be talking Variable Length Subnet Masking in detail you haven’t needed to think about since doing your CCNA years ago…

Why Masking is Important

A route to a single Point Code is called a “/14”, this is akin to a single IPv4 address being called a “/32”.

We could setup all our routing tables with static routes to each point code (/14), but with about 4,000 international point codes, this might be a challenge.

Instead, by using Masks, we can group together ranges of Point Codes and route those ranges through a particular STP.

This opens up the ability to achieve things like “Route all traffic to Point Codes to this Default Gateway STP”, or to say “Route all traffic to this region through this STP”.

Individually routing to a point code works well for small scale networking, but there’s power, flexibility and simplification that comes from grouping together ranges of point codes.

Information Overload about Point Codes

So far we’ve talked about point codes in the X.YYY.Z format, in our lab we setup point codes like 1.2.3.

This is not the only option however…

Variants of SS7 Point Codes

IPv4 addresses look the same regardless of where you are. From Algeria to Zimbabwe, IPv4 addresses look the same and route the same.

Standards
XKCD 927: Standards

In SS7 networks that’s not the case – There are a lot of variants that define how a point code is structured, how long it is, etc. Common variants are ANSI, ITU-T (International & National variants), ETSI, Japan NTT, TTC & China.

The SS7 variant used must match on both ends of a link; this means an SS7 node speaking ETSI flavoured Point Codes can’t exchange messages with an ANSI flavoured Point Code.

Well, you can kinda translate from one variant to another, but requires some rewriting not unlike how NAT does it.

ITU International Variant

For the start of this series, we’ll be working with the ITU International variant / flavour of Point Code.

ITU International point codes are 14 bits long, and format is described as 3-8-3.
The 3-8-3 form of Point code just means the 14 bit long point code is broken up into three sections, the first section is made up of the first 3 bits, the second section is made up of the next 8 bits then the remaining 3 bits in the last section, for a total of 14 bits.

So our 14 bit 3-8-3 Point Code looks like this in binary form:

000-00000000-000 (Binary) == 0-0-0 (Point Code)

So a point code of 1-2-3 would look like:

001-00000010-011 (Binary) == 1-2-3 (Point Code) [001 = 1, 00000010 = 2, 011 = 3]

This gives us the following maximum values for each part:

111-11111111-111 (Binary) == 7-255-7 (Point Code)

This is not the only way to represent point codes, if we were to take our binary string for 1-2-3 and remove the hyphens, we get 00100000010011. If you convert this binary string into an Integer/Decimal value, you’ll get 2067.

If you’re dealing with multiple vendors or products,you’ll see some SS7 Point Codes represented as decimal (2067), some showing as 1-2-3 codes and sometimes just raw binary.
Fun hey?

Handy point code formatting tool

Why we need to know about Binary and Point Codes

So why does the binary part matter? Well the answer is for masks.

To loop back to the start of this post, we talked about IP routing using a network address and netmask, to represent a range of IP addresses. We can do the same for SS7 Point Codes, but that requires a teeny bit of working out.

As an example let’s imagine we need to setup a route to all point codes from 3-4-0 through to 3-6-7, without specifying all the individual point codes between them.

Firstly let’s look at our start and end point codes in binary:

100-00000100-000 = 3-004-0 (Start Point Code)
100-00000110-111 = 3-006-7 (End Point Code)

Looking at the above example let’s look at how many bits are common between the two,

100-00000100-000 = 3-004-0 (Start Point Code)
100-00000110-111 = 3-006-7 (End Point Code)

The first 9 bits are common, it’s only the last 5 bits that change, so we can group all these together by saying we have a /9 mask.

When it comes time to add a route, we can add a route to 3-4-0/9 and that tells our STP to match everything from point code 3-4-0 through to point code 3-6-7.

The STP doing the routing it only needs to match on the first 9 bits in the point code, to match this route.

SS7 Routing Tables

Now we have covered Masking for roues, we can start putting some routes into our network.

In order to get a message from one point code to another point code, where there isn’t a direct linkset between the two, we need to rely on routing, which is performed by our STPs.

This is where all that point code mask stuff we just covered comes in.

Let’s look at a diagram below,

Let’s look at the routing to get a message from Exchange A (SSP) on the bottom left of the picture to Exchange E (SSP) with Point Code 4.5.3 in the bottom right of the picture.

Exchange A (SSP) on the bottom left of the picture has point code 1.2.3 assigned to it and a Linkset to STP-A.
It has the implicit route to STP-A as it’s got that linkset, but it’s also got a route configured on it to reach any other point code via the Linkset to STP-A via the 0.0.0/0 route which is the SS7 equivalent of a default route. This means any traffic to any point code will go to STP-A.

From STP-A we have a linkset to STP-B. In order to route to the point codes behind STP-B, STP-A has a route to match any Point Code starting with 4.5.X, which is 4.5.0/11.
This means that STP-A will route any Point Code between 4.5.1 and 4.5.7 down the Linkset to STP-B.

STP-B has got a direct connection to Exchange B and Exchange E, so has implicit routes to reach each of them.

So with that routing table, Exchange A should be able to route a message to Exchange E.

But…

Return Routing

Just like in IP routing, we need return routing. while Exchange A (SSP) at 1.2.3 has a route to everywhere in the network, the other parts of the network don’t have a route to get to it. This means a request from 1.2.3 can get anywhere in the network, but it can’t get a response back to 1.2.3.

So to get traffic back to Exchange A (SSP) at 1.2.3, our two Exchanges on the right (Exchange B & C with point codes 4.5.6 and 4.5.3) will need routes added to them. We’ll also need to add routes to STP-B, and once we’ve done that, we should be able to get from Exchange A to any point code in this network.

There is a route missing here, see if you can pick up what it is!

So we’ve added a default route via STP-B on Exchange B & Exchange E, and added a route on STP-B to send anything to 1.2.3/14 via STP-A, and with that we should be able to route from any exchange to any other exchange.

One last point on terminology – when we specify a route we don’t talk in terms of the next hop Point Code, but the Linkset to route it down. For example the default route on Exchange A is 0.0.0/0 via STP-A linkset (The linkset from Exchange A to STP-A), we don’t specify the point code of STP-A, but just the name of the Linkset between them.

Back into the Lab

So back to the lab, where we left it was with linksets between each point code, so each Country could talk to it’s neighbor.

Let’s confirm this is the case before we go setting up routes, then together, we’ll get a route from Country A to Country C (and back).

So let’s check the status of the link from Country B to its two neighbors – Country A and Country C. All going well it should look like this, and if it doesn’t, then stop by my last post and check you’ve got everything setup.

CountryB#show cs7 linkset 
lsn=ToCountryA          apc=1.2.3         state=avail     avail/links=1/1
  SLC  Interface                    Service   PeerState         Inhib
  00   10.0.5.1 1024 1024           avail     InService         -----

lsn=ToCountryC          apc=7.7.1         state=avail     avail/links=1/1
  SLC  Interface                    Service   PeerState         Inhib
  00   10.0.6.2 1025 1025           avail     InService         -----


So let’s add some routing so Country A can reach Country C via Country B. On Country A STP we’ll need to add a static route. For this example we’ll add a route to 7.7.1/14 (Just Country C).

That means Country A knows how to get to Country C. But with no return routing, Country C doesn’t know how to get to Country A. So let’s fix that.

We’ll add a static route to Country C to send everything via Country B.

CountryC#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
CountryC(config)#cs7 route-table system
CountryC(config)#update route 0.0.0/0 linkset ToCountryB
*Jan 01 05:37:28.879: %CS7MTP3-5-DESTSTATUS: Destination 0.0.0 is accessible

So now from Country C, let’s see if we can ping Country A (Ok, it’s not a “real” ICMP ping, it’s a link state check message, but the result is essentially the same).

By running:

CountryC# ping cs7 1.2.3
*Jan 01 06:28:53.699: %CS7PING-6-RTT: Test Q.755 1.2.3: MTP Traffic test rtt 48/48/48
*Jan 01 06:28:53.699: %CS7PING-6-STAT: Test Q.755 1.2.3: MTP Traffic test 100% successful packets(1/1)
*Jan 01 06:28:53.699: %CS7PING-6-RATES: Test Q.755 1.2.3: Receive rate(pps:kbps) 1:0 Sent rate(pps:kbps) 1:0
*Jan 01 06:28:53.699: %CS7PING-6-TERM: Test Q.755 1.2.3: MTP Traffic test terminated.

We can confirm now that Country C can reach Country A, we can do the same from Country A to confirm we can reach Country B.

But what about Country D? The route we added on Country A won’t cover Country D, and to get to Country D, again we go through Country B.

This means we could group Country C and Country D into one route entry on Country A that matches anything starting with 7-X-X,

For this we’d add a route on Country A, and then remove the original route;

CountryA(config)# cs7 route-table system
CountryA(config-cs7-rt)#update route 7.0.0/3 linkset ToCountryB
CountryA(config-cs7-rt)#no update route 7.7.1/14 linkset ToCountryB

Of course, you may have already picked up, we’ll need to add a return route to Country D, so that it has a default route pointing all traffic to STP-B. Once we’ve done that from Country A we should be able to reach all the other countries:

CountryA#show cs7 route 
Dynamic Routes 0 of 1000

Routing table = system Destinations = 3 Routes = 3

Destination            Prio Linkset Name        Route
---------------------- ---- ------------------- -------        
4.5.6/14         acces   1  ToCountryB          avail          
7.0.0/3          acces   5  ToCountryB          avail          


CountryA#ping cs7 7.8.1
*Jan 01 07:28:19.503: %CS7PING-6-RTT: Test Q.755 7.8.1: MTP Traffic test rtt 84/84/84
*Jan 01 07:28:19.503: %CS7PING-6-STAT: Test Q.755 7.8.1: MTP Traffic test 100% successful packets(1/1)
*Jan 01 07:28:19.503: %CS7PING-6-RATES: Test Q.755 7.8.1: Receive rate(pps:kbps) 1:0  Sent rate(pps:kbps) 1:0 
*Jan 01 07:28:19.507: %CS7PING-6-TERM: Test Q.755 7.8.1: MTP Traffic test terminated.
CountryA#ping cs7 7.7.1
*Jan 01 07:28:26.839: %CS7PING-6-RTT: Test Q.755 7.7.1: MTP Traffic test rtt 60/60/60
*Jan 01 07:28:26.839: %CS7PING-6-STAT: Test Q.755 7.7.1: MTP Traffic test 100% successful packets(1/1)
*Jan 01 07:28:26.839: %CS7PING-6-RATES: Test Q.755 7.7.1: Receive rate(pps:kbps) 1:0  Sent rate(pps:kbps) 1:0 
*Jan 01 07:28:26.843: %CS7PING-6-TERM: Test Q.755 7.7.1: MTP Traffic test terminated.

So where to from here?

Well, we now have a a functional SS7 network made up of STPs, with routing between them, but if we go back to our SS7 network overview diagram from before, you’ll notice there’s something missing from our lab network…

So far our network is made up only of STPs, that’s like building a network only out of routers!

In our next lab, we’ll start adding some SSPs to actually generate some SS7 traffic on the network, rather than just OAM traffic.

Telephony binary-coded decimal (TBCD) in Python with Examples

Chances are if you’re reading this, you’re trying to work out what Telephony Binary-Coded Decimal encoding is. I got you.

Again I found myself staring at encoding trying to guess how it worked, reading references that looped into other references, in this case I was encoding MSISDN AVPs in Diameter.

How to Encode a number using Telephony Binary-Coded Decimal encoding?

First, Group all the numbers into pairs, and reverse each pair.

So a phone number of 123456, becomes:

214365

Because 1 & 2 are swapped to become 21, 3 & 4 are swapped to become 34, 5 & 6 become 65, that’s how we get that result.

TBCD Encoding of numbers with an Odd Length?

If we’ve got an odd-number of digits, we add an F on the end and still flip the digits,

For example 789, we add the F to the end to pad it to an even length, and then flip each pair of digits, so it becomes:

87F9

That’s the abbreviated version of it. If you’re only encoding numbers that’s all you’ll need to know.

Detail Overload

Because the numbers 0-9 can be encoded using only 4 bits, the need for a whole 8 bit byte to store this information is considered excessive.

For example 1 represented as a binary 8-bit byte would be 00000001, while 9 would be 00001001, so even with our largest number, the first 4 bits would always going to be 0000 – we’d only use half the available space.

So TBCD encoding stores two numbers in each Byte (1 number in the first 4 bits, one number in the second 4 bits).

To go back to our previous example, 1 represented as a binary 4-bit word would be 0001, while 9 would be 1001. These are then swapped and concatenated, so the number 19 becomes 1001 0001 which is hex 0x91.

Let’s do another example, 82, so 8 represented as a 4-bit word is 1000 and 2 as a 4-bit word is 0010. We then swap the order and concatenate to get 00101000 which is hex 0x28 from our inputted 82.

Final example will be a 3 digit number, 123. As we saw earlier we’ll add an F to the end for padding, and then encode as we would any other number,

F is encoded as 1111.

1 becomes 0001, 2 becomes 0010, 3 becomes 0011 and F becomes 1111. Reverse each pair and concatenate 00100001 11110011 or hex 0x21 0xF3.

Special Symbols (#, * and friends)

Because TBCD Encoding was designed for use in Telephony networks, the # and * symbols are also present, as they are on a telephone keypad.

Astute readers may have noticed that so far we’ve covered 0-9 and F, which still doesn’t use all the available space in the 4 bit area.

The extended DTMF keys of A, B & C are also valid in TBCD (The D key was sacrificed to get the F in).

Symbol4 Bit Word
*1 0 1 0
#1 0 1 1
a1 1 0 0
b1 1 0 1
c1 1 1 0

So let’s run through some more examples,

*21 is an odd length, so we’ll slap an F on the end (*21F), and then encoded each pair of values into bytes, so * becomes 1010, 2 becomes 0010. Swap them and concatenate for our first byte of 00101010 (Hex 0x2A). F our second byte 1F, 1 becomes 0001 and F becomes 1111. Swap and concatenate to get 11110001 (Hex 0xF1). So *21 becomes 0x2A 0xF1.

And as promised, some Python code from PyHSS that does it for you:

    def TBCD_special_chars(self, input):
        if input == "*":
            return "1010"
        elif input == "#":
            return "1011"
        elif input == "a":
            return "1100"
        elif input == "b":
            return "1101"
        elif input == "c":
            return "1100"      
        else:
            print("input " + str(input) + " is not a special char, converting to bin ")
            return ("{:04b}".format(int(input)))


    def TBCD_encode(self, input):
        print("TBCD_encode input value is " + str(input))
        offset = 0
        output = ''
        matches = ['*', '#', 'a', 'b', 'c']
        while offset < len(input):
            if len(input[offset:offset+2]) == 2:
                bit = input[offset:offset+2]    #Get two digits at a time
                bit = bit[::-1]                 #Reverse them
                #Check if *, #, a, b or c
                if any(x in bit for x in matches):
                    new_bit = ''
                    new_bit = new_bit + str(TBCD_special_chars(bit[0]))
                    new_bit = new_bit + str(TBCD_special_chars(bit[1]))    
                    bit = str(int(new_bit, 2))
                output = output + bit
                offset = offset + 2
            else:
                bit = "f" + str(input[offset:offset+2])
                output = output + bit
                print("TBCD_encode output value is " + str(output))
                return output
    

    def TBCD_decode(self, input):
        print("TBCD_decode Input value is " + str(input))
        offset = 0
        output = ''
        while offset < len(input):
            if "f" not in input[offset:offset+2]:
                bit = input[offset:offset+2]    #Get two digits at a time
                bit = bit[::-1]                 #Reverse them
                output = output + bit
                offset = offset + 2
            else:   #If f in bit strip it
                bit = input[offset:offset+2]
                output = output + bit[1]
                print("TBCD_decode output value is " + str(output))
                return output

Adding Vlans to VMware Workstation

Just discovered you can add VLANs to Realtek NICs on Windows PCs,

I have a fairly grunty desktop I use for running anything that needs Windows, running VMware Workstation and occasional gaming,

I do have a big Dell machine running ESXi which supports VLAN tagging and trunking, but I try and avoid using it as it’s deafeningly loud and very power hungry.

Recently as the lab network I use grows and grows I’ve been struggling to run all the VMs running in Workstation as I’ve been running out of IP space and wanting some more separation between networks.

Now I can add VLANs onto the existing NIC using the Realtek Ethernet Diagnostic Utility, and then bridge each of these NICs to the respective VM in Workstation, and the port to the Mikrotik CRS is now a trunk with all the VLANs on it.

Perfect!

Open5Gs Logo

Open5Gs Database Schema Change

As Open5Gs has introduced network slicing, which led to a change in the database used,

Alas many users had subscribers provisioned in the old DB schema and no way to migrate the SDM data between the old and new schema,

If you’ve created subscribers on the old schema, and now after the updates your Subscriber Authentication is failing, check out this tool I put together, to migrate your data over.

The Open5Gs Python library I wrote has also been updated to support the new schema.

A very unstable Diameter Routing Agent (DRA) with Kamailio

I’d been trying for some time to get Kamailio acting as a Diameter Routing Agent with mixed success, and eventually got it working, after a few changes to the codebase of the ims_diameter_server module.

It is rather unstable, in that if it fails to dispatch to a Diameter peer, the whole thing comes crumbling down, but incoming Diameter traffic is proxied off to another Diameter peer, and Kamailio even adds an extra AVP.

Having used Kamailio for so long I was really hoping I could work with Kamailio as a DRA as easily as I do for SIP traffic, but it seems the Diameter module still needs a lot more love before it’ll be stable enough and simple enough for everyone to use.

I created a branch containing the fixes I made to make it work, and with an example config for use, but use with caution. It’s a long way from being production-ready, but hopefully in time will evolve.

https://github.com/nickvsnetworking/kamailio/tree/Diameter_Fix

Diff + Wireshark

Supposedly Archimedes once said:

Give me a lever long enough and a fulcrum on which to place it, and I shall move the world.

For me, the equivalent would be:

Give me a packet capture of the problem occurring and a standards document against which to compare it, and I shall debug the networking world.

And if you’re like me, there’s a good chance when things are really not going your way, you roll up your sleeves, break out Wireshark and the standards docs and begin the painstaking process of trying to work out what’s not right.

Today’s problem involved a side by side comparison between a pcap of a known good call, and one which is failing, so I just had to compare the two, which is slow and fairly error-prone,

So I started looking for something to diff PCAPs easily. The data I was working with was ASN.1 encoded so I couldn’t export as text like you can with HTTP or SIP based protocols and compare it that way.

In the end I stumbled across something even better to compare frames from packet captures side by side, with the decoding intact!

Turns out yo ucan copy the values including decoding from within Wireshark, which means you can then just paste the contents into a diff tool (I’m using the fabulous Meld on Linux, but any diff tool will do including diff itself) and off you go, side-by-side comparison.

Select the first packet/frame you’re interested in (or even just the section), expand the subkeys, right click, copy “All Visible items”. This copy contains all the decoded data, not just the raw bytes, which is what makes it so great.

Next paste it into your diff tool of choice, repeat with the one to compare against, scroll past the data you know is going to be different (session IDs, IPs, etc) and ta-da, there’s the differences.

Video of the whole process below:

PyHSS Update – YAML Config Files

One feature I’m pretty excited to share is the addition of a single config file for defining how PyHSS functions,

In the past you’d set variables in the code or comment out sections to change behaviour, which, let’s face it – isn’t great.

Instead the config.yaml file defines the PLMN, transport time (TCP or SCTP), the origin host and realm.

We can also set the logging parameters, SNMP info and the database backend to be used,

HSS Parameters
 hss:
   transport: "SCTP"
   #IP Addresses to bind on (List) - For TCP only the first IP is used, for SCTP all used for Transport (Multihomed).
   bind_ip: ["10.0.1.252"]
 #Port to listen on (Same for TCP & SCTP)
   bind_port: 3868
 #Value to populate as the OriginHost in Diameter responses
   OriginHost: "hss.localdomain"
 #Value to populate as the OriginRealm in Diameter responses
   OriginRealm: "localdomain"
 #Value to populate as the Product name in Diameter responses
   ProductName: "pyHSS"
 #Your Home Mobile Country Code (Used for PLMN calcluation)
   MCC: "999"
   #Your Home Mobile Network Code (Used for PLMN calcluation)
   MNC: "99"
 #Enable GMLC / SLh Interface
   SLh_enabled: True


 logging:
   level: DEBUG
   logfiles:
     hss_logging_file: log/hss.log
     diameter_logging_file: log/diameter.log
     database_logging_file: log/db.log
   log_to_terminal: true

 database:
   mongodb:
     mongodb_server: 127.0.0.1
     mongodb_username: root
     mongodb_password: password
     mongodb_port: 27017

 Stats Parameters
 redis:
   enabled: True
   clear_stats_on_boot: False
   host: localhost
   port: 6379
 snmp:
   port: 1161
   listen_address: 127.0.0.1

Being mean to Mikrotiks – Pushing SMB File Share

I’d tried in the past to use the USB port on the Mikrotik, an external HDD and the SMB server in RouterOS, to act as a simple NAS for sharing files on the home network. And the performance was terrible.

This is because the device is a Router. Not a NAS (duh). And everything I later read online confirmed that yes, this is a router, not a NAS so stop trying.

But I recently got a new Mikrotik CRS109, so now I have a new Mikrotik, how bad is the SMB file share performance?

To test this I’ve got a USB drive with some files on it, an old Mikrotik RB915G and the new Mikrotik CRS109-8G-1S-2HnD-IN, and compared the time it takes to download a file between the two.

Mikrotik Routerboard RB951G

While pulling a 2Gb file of random data from a FAT formatted flash drive, I achieved a consistent 1.9MB/s (15.2 Mb/s)

nick@oldfaithful:~$ smbget smb://10.0.1.1/share1/2Gb_file.bin
 Password for [nick] connecting to //share1/10.0.1.1: 
 Using workgroup WORKGROUP, user nick
 smb://10.0.1.1/share1/2Gb_file.bin                                                                                                                                              
 Downloaded 2.07GB in 1123 seconds

Mikrotik CRS109

In terms of transfer speed, a consistent 2.8MB/s (22.4 Mb/s)

nick@oldfaithful:~$ smbget smb://10.0.1.1/share1/2Gb_file.bin
 Password for [nick] connecting to //share1/10.0.1.1: 
 Using workgroup WORKGROUP, user nick
 smb://10.0.1.1/share1/2Gb_file.bin                                              
 Downloaded 2.07GB in 736 seconds

Profiler shows 25% CPU usage on “other”, which drops down as soon as the file transfers stop,

So better, but still not a NAS (duh).

The Verdict

Still not a NAS. So stop trying to use it as a NAS.

As my download speed is faster than 22Mbps I’d just be better to use cloud storage.

PyHSS New Features

Thanks to some recent developments, PyHSS has had a major overhaul recently, and is getting better than ever,

Some features that are almost ready for public release are:

Config File

Instead of having everything defined all over the place a single YAML config file is used to define how the HSS should function.

SCTP Support

No longer just limited to TCP, PyHSS now supports SCTP as well for transport,

SLh Interface for Location Services

So the GMLC can query the HSS as to the serving MME of a subscriber.

Additional Database Backends (MSSQL & MySQL)

No longer limited to just MongoDB, simple functions to add additional backends too and flexible enough to meet your existing database schema.

All these features will be merged into the mainline soon, and documented even sooner. I’ll share some posts on each of these features as I go.

GIF showing using Redis-CLI to get a value

Adding SNMP to anything with Redis and Python

I’ve been adding SNMP support to an open source project I’ve been working on (PyHSS) to generate metrics / performance statistics from it, and this meant staring down SNMP again, but this time I’ve come up with a novel way to handle SNMP, that made it much less painful that normal.

The requirement was simple enough, I already had a piece of software I’d written in Python, but I had a need to add an SNMP server to get information about that bit of software.

For a little more detail – PyHSS handles Device Watchdog Requests already, but I needed a count of how many it had handled, made accessible via SNMP. So inside the logic that does this I just increment a counter in Redis;

#Device Watchdog Answer
    def Answer_280(self, packet_vars, avps):                                                      
        self.redis_store.incr('Answer_280_attempt_count')

In the code example above I just add 1 (increment) the Redis key ‘Answer_280_attempt_count’.

The beauty is that that this required minimal changes to the rest of my code – I just sprinkled in these statements to increment Redis keys throughout my code.

Now when that existing function is run, the Redis key “Answer_280_attempt_count” is incremented.

So I ran my software and the function I just added the increment to was called a few times, so I jumped into redis-cli to check on the values;

GIF showing using Redis-CLI to get a value

And just like that we’ve done all the heavy lifting to add SNMP to our software.

For anything else we want counters on, add the increment to your code to store a counter in Redis with that information.

So next up we need to expose our Redis keys via SNMP,

For this, I took a simple SNMP server example from Stackoverflow, to set the output of a MIB tree, and simply bolted in getting a bit of data from, code below:

#Pulled from https://stackoverflow.com/questions/58909285/how-to-add-variable-in-the-mib-tree

from pysnmp.entity import engine, config
from pysnmp.entity.rfc3413 import cmdrsp, context
from pysnmp.carrier.asyncore.dgram import udp
from pysnmp.smi import instrum, builder
from pysnmp.proto.api import v2c
import datetime
import redis


import redis
redis_store = redis.Redis(host='localhost', port=6379, db=0)
# Create SNMP engine
snmpEngine = engine.SnmpEngine()

# Transport setup

# UDP over IPv4
config.addTransport(
    snmpEngine,
    udp.domainName,
    udp.UdpTransport().openServerMode(('127.0.0.1', 1161))
)

# SNMPv3/USM setup

# user: usr-md5-none, auth: MD5, priv NONE
config.addV3User(
    snmpEngine, 'usr-md5-none',
    config.usmHMACMD5AuthProtocol, 'authkey1'
)
# Allow full MIB access for each user at VACM
config.addVacmUser(snmpEngine, 3, 'usr-md5-none', 'authNoPriv', (1, 3, 6, 1, 2, 1), (1, 3, 6, 1, 2, 1))


# SNMPv2c setup

# SecurityName <-> CommunityName mapping.
config.addV1System(snmpEngine, 'my-area', 'public')

# Allow full MIB access for this user / securityModels at VACM
config.addVacmUser(snmpEngine, 2, 'my-area', 'noAuthNoPriv', (1, 3, 6, 1, 2, 1), (1, 3, 6, 1, 2, 1))

# Get default SNMP context this SNMP engine serves
snmpContext = context.SnmpContext(snmpEngine)


# Create an SNMP context with default ContextEngineId (same as SNMP engine ID)
snmpContext = context.SnmpContext(snmpEngine)

# Create multiple independent trees of MIB managed objects (empty so far)
mibTreeA = instrum.MibInstrumController(builder.MibBuilder())
mibTreeB = instrum.MibInstrumController(builder.MibBuilder())

# Register MIB trees at distinct SNMP Context names
snmpContext.registerContextName(v2c.OctetString('context-a'), mibTreeA)
snmpContext.registerContextName(v2c.OctetString('context-b'), mibTreeB)

mibBuilder = snmpContext.getMibInstrum().getMibBuilder()

MibScalar, MibScalarInstance = mibBuilder.importSymbols(
    'SNMPv2-SMI', 'MibScalar', 'MibScalarInstance'
)
class MyStaticMibScalarInstance(MibScalarInstance):
    def getValue(self, name, idx):
        currentDT = datetime.datetime.now()
        return self.getSyntax().clone(
            'Hello World!! It\'s currently: ' + str(currentDT)
        )

class AnotherStaticMibScalarInstance(MibScalarInstance):
    def getValue(self, name, idx):
        return self.getSyntax().clone('Ahoy hoy?')

class Answer_280_attempt_count(MibScalarInstance):
    def getValue(self, name, idx):
        return self.getSyntax().clone(redis_store.get('Answer_280_attempt_count'))


mibBuilder.exportSymbols(
    '__MY_MIB', MibScalar((1, 3, 6, 1, 2, 1, 1, 1), v2c.OctetString()),
    MyStaticMibScalarInstance((1, 3, 6, 1, 2, 1, 1, 1), (0,), v2c.OctetString()),
    AnotherStaticMibScalarInstance((1, 3, 6, 1, 2, 1, 1, 1), (0,1), v2c.OctetString()),
    Answer_280_attempt_count((1, 3, 6, 1, 2, 1, 1, 1), (0,2), v2c.Integer32())
)

# Register SNMP Applications at the SNMP engine for particular SNMP context
cmdrsp.GetCommandResponder(snmpEngine, snmpContext)
cmdrsp.SetCommandResponder(snmpEngine, snmpContext)
cmdrsp.NextCommandResponder(snmpEngine, snmpContext)
cmdrsp.BulkCommandResponder(snmpEngine, snmpContext)

# Register an imaginary never-ending job to keep I/O dispatcher running forever
snmpEngine.transportDispatcher.jobStarted(1)

# Run I/O dispatcher which would receive queries and send responses
try:
    snmpEngine.transportDispatcher.runDispatcher()

except:
    snmpEngine.transportDispatcher.closeDispatcher()
    raise

While PySNMP can be a bit much to wrap your head around, all you need to know:

V2 community string set in:

config.addV1System(snmpEngine, 'my-area', 'public')

Create an additional class from the template below for each of your Redis keys you wish to expose;

class something_else_from_Redis(MibScalarInstance):
    def getValue(self, name, idx):
        return self.getSyntax().clone(redis_store.get('something_else_from_Redis'))

Renaming the class and replacing the redis_store.get() value with the Redis key you want to pull,

And finally inside mibBuilder.exportSymbols() add each of the new classes you added and the OID for each;

    Answer_280_attempt_count((1, 3, 6, 1, 2, 1, 1, 1), (0,2), v2c.Integer32())
    something_else_from_Redis((1, 3, 6, 1, 2, 1, 1, 1), (0,3), v2c.Integer32())

Then when you run it, presto, you’re exposing that data via SNMP.

You can verify it through SNMP walk or start integrating it into your NMS, in the above example OID 1.3.6.1.2.1.1.1.0.2, contains the value of Answer_280_attempt_count from Redis, and with that, you’re exposing info via SNMP, all while not really having to think about SNMP.

*Ok, you still have to sort which OIDs you assign for what, but you get the idea.

VoIP is an only child – ‘Gotchas’ on running VoIP applications inside Containers

It’s 2021, and everyone loves Containers; Docker & Kubernetes are changing how software is developed, deployed and scaled.

And yet so much of the Telco world still uses bare metal servers and dedicated hardware for processing.

So why not use Containers or VMs more for VoIP applications?

Disclaimer – When I’m talking VoIP about VoIP I mean the actual Voice over IP, that’s the Media Stream, RTP, the Audio, etc, not the Signaling (SIP). SIP is fine with Containers, it’s the media that has a bad time and that this post focuses on,

Virtualization Fundamentals

Once upon a time in Development land every application ran on it’s own server running in a DC / Central Office.

This was expensive to deploy (buying servers), operate (lots of power used) and maintain (lots of hardware to keep online).

Each server was actually sitting idle for a large part of the time, with the application running on it only using a some of the available resources some of the time.

One day Virtualization came and suddenly 10 physical servers could be virtualized into 10 VMs.

These VMs still need to run on servers but as each VM isn’t using 100% of it’s allocated resources all the time, instead of needing 10 servers to run it on you could run it on say 3 servers, and even do clever things like migrate VMs between servers if one were to fail.

VMs share the resources of the server it’s running on.

A server running VMs (Hypervisor) is able to run multiple VMs by splitting the resources between VMs.

If a VM A wants to run an operation at the same time a VM B & VM C, the operations can’t be run on each VM at the same time* so the hypervisor will queue up the requests and schedule them in, typically based on first-in-first out or based on a resource priority policy on the Hypervisor.

This is fine for a if VM A, B & C were all Web Servers.
A request coming into each of them at the same time would see the VM the Hypervisor schedules the resources to respond to the request slightly faster, with the other VMs responding to the request when the hypervisor has scheduled the resources to the respective VM.

VoIP is an only child

VoIP has grown up on dedicated hardware. It’s an only child that does not know how to share, because it’s never had to.

Having to wait for resources to be scheduled by the Hypervisor to to VM in order for it to execute an operation is fine and almost unnoticeable for web servers, it can have some pretty big impacts on call quality.

If we’re running RTPproxy or RTPengine in order to relay media, scheduling delays can mean that the media stream ends up “bursty”.

RTP packets needing relaying are queued in the buffer on the VM and only relayed when the hypervisor is able to schedule resources, this means there can be a lot of packet-delay-variation (PDV) and increased latency for services running on VMs.

VMs and Containers both have this same fate, DPDK and SR-IOV assist in throughput, but they don’t stop interrupt headaches.

VMs that deprive other VMs on the same host of resources are known as “Noisy neighbors”.

The simple fix for all these problems? There isn’t one.

Each of these issues can be overcome, dedicating resources, to a specific VM or container, cleverly distributing load, but it is costly in terms of resources and time to tweak and implement, and some of these options undermine the value of virtualization or containerization.

As technology marches forward we have scenarios where Kubernetes can expose FPGA resources to pass them through to Pods, but right now, if you need to transcode more than ~100 calls efficiently, you’re going to need a hardware device.

And while it can be done by throwing more x86 / ARM compute resources at the problem, hardware still wins out as cheaper in most instances.

Sorry, no easy answers here…

Dr StrangeEncoding or: How I learned to stop worrying and love ASN.1

Australia is a strange country; As a kid I was scared of dogs, and in response, our family got a dog.

This year started off with adventures working with ASN.1 encoded data, and after a week of banging my head against the table, I was scared of ASN.1 encoding.

But now I love dogs, and slowly, I’m learning to embrace ASN.1 encoding.

What is ASN.1?

ASN.1 is an encoding scheme.

The best analogy I can give is to image a sheet of paper with a form on it, the form has fields for all the different bits of data it needs,

Each of the fields on the form has a data type, and the box is sized to restrict input, and some fields are mandatory.

Now imagine you take this form and cut a hole where each of the text boxes would be.

We’ve made a key that can be laid on top of a blank sheet of paper, then we can fill the details through the key onto the blank paper and reuse the key over and over again to fill the data out many times.

When we remove the key off the top of our paper, and what we have left on the paper below is the data from the form. Without the key on top this data doesn’t make much sense, but we can always add the key back and presto it’s back to making sense.

While this may seem kind of pointless let’s look at the advantages of this method;

The data is validated by the key – People can’t put a name wherever, and country code anywhere, it’s got to be structured as per our requirements. And if we tried to enter a birthday through the key form onto the paper below, we couldn’t.

The data is as small as can be – Without all the metadata on the key above, such as the name of the field, the paper below contains only the pertinent information, and if a field is left blank it doesn’t take up any space at all on the paper.

It’s these two things, rigidly defined data structures (no room for errors or misinterpretation) and the minimal size on the wire (saves bandwidth), that led to 3GPP selecting ASN.1 encoding for many of it’s protocols, such as S1, NAS, SBc, X2, etc.

It’s also these two things that make ASN.1 kind of a jerk; If the data structure you’re feeding into your ASN.1 compiler does not match it will flat-out refuse to compile, and there’s no way to make sense of the data in its raw form.

I wrote a post covering the very basics of working with ASN.1 in Python here.

But working with a super simple ASN.1 definition you’ve created is one thing, using the 3GPP defined ASN.1 definitions is another,

With the aid of the fantastic PyCrate library, which is where the real magic happens, and this was the nut I cracked this week, compiling a 3GPP ASN.1 definition and communicating a standards-based protocol with it.

Watch this space for more fun with ASN.1!

MSSQL in Docker

I recently had to add MSSQL (Microsoft SQL) support to PyHSS, and to be honest I wasn’t looking forward to it.

I was expecting I’d have to setup a VM running Server 2016, then load those roles etc.

But instead:

docker run -e 'ACCEPT_EULA=Y' -e 'SA_PASSWORD=thisisthepasswordforMSSQL99#!' -p 1433:1433 -d mcr.microsoft.com/mssql/server:2017-latest

And it was up and running.

Way more painless than I expected!

Getting the GTP-U Packets flowing Fast – DPDK & SR-IOV

So dedicated appliances are dead and all our network functions are VMs or Containers, but there’s a performance hit when going virtual as the L2 processing has to be handled by the Hypervisor before being passed onto the relevant VM / Container.

If we have a 10Gb NIC in our server, we want to achieve a 10Gbps “Line Speed” on the Network Element / VNF we’re running on.

When we talked about appliances if you purchased an P-GW with 10Gbps NIC, it was a given you could get 10Gbps through it (without DPI, etc), but when we talk about virtualized network functions / network elements there’s a very real chance you won’t achieve the “line speed” of your interfaces without some help.

When you’ve got a Network Element like a S-GW, P-GW or UPF, you want to forward packets as quickly as possible – bottlenecks here would impact the user’s achievable speeds on the network.

To speed things up there are two technologies, that if supported by your software stack and hardware, allows you to significantly increase throughput on network interfaces, DPDK & SR-IOV.

DPDK – Data Plane Development Kit

Usually *Nix OSs handle packet processing on the Kernel level. As I type this the packets being sent to this WordPress server by Firefox are being handled by the Linux 5.8.0-36-generic kernel running on my machine.

The problem is the kernel has other things to do (interrupts), meaning increased delay in processing (due to waiting for processing capability) and decreased capacity.

DPDK shunts this processing to the “user space” meaning your application (the actual magic of the VNF / Network Element) controls it.

To go back to me writing this – If Firefox and my laptop supported DPDK, then the packets wouldn’t traverse the Linux kernel at all, and Firefox would be talking directly to my NIC. (Obviously this isn’t the case…)

So DPDK increases network performance by shifting the processing of packets to the application, bypassing the kernel altogether. You are still limited by the CPU and Memory available, but with enough of each you should reach very near to line speed.

SR-IOV – Single Root Input Output Virtualization

Going back to the me writing this analogy I’m running Linux on my laptop, but let’s imagine I’m running a VM running Firefox under Linux to write this.

If that’s the case then we have an even more convolted packet processing chain!

I type the post into Firefox which sends the packets to the Linux kernel, which waits to be scheduled resources by the hypervisor, which then process the packets in the hypervisor kernel before finally making it onto the NIC.

We could add DPDK which skips some of these steps, but we’d still have the bottleneck of the hypervisor.

With PCIe passthrough we could pass the NIC directly to the VM running the Firefox browser window I’m typing this, but then we have a problem, no other VMs can access these resources.

SR-IOV provides an interface to passthrough PCIe to VMs by slicing the PCIe interface up and then passing it through.

My VM would be able to access the PCIe side of the NIC, but so would other VMs.

So that’s the short of it, SR-IOR and DPDK enable better packet forwarding speeds on VNFs.

Docker Cheatsheet

I kept forgetting the basic Docker commands, so I made a cheat sheet for myeslf and thought I’d share:

List running Containers:
docker ps
List all Containers (Including stopped)
docker ps -a
Start a Container
docker run sdfjkdskj/sdfdsafa
List Images
docker image list
Build an Image
docker build -t myapp:v1 .
Connect to Shell of running Container
docker exec -it intelligent_chebyshev /bin/bash

List of Open Source Evolved Packet Core (EPC) Implementations

Open5Gs

Formerly NextEPC.

OpenAI Core Network

Related to / branched from OMEC.

Magma

Based on OMEC, with a focus on Fixed Wireless more than mobile.

Not fair to consider it just an EPC, Magma is highly scaleable and designed with a focus on Fixed Wireless offerings.

Supported by the Facebook Telecom Infra Project.

OMEC – Open Evolved Mobile Core

Supported by Open Networking Foundation, Sprint and several other large players.

OMEC has each Network Element in it’s own repo in GitHub and each is managed by a different team.

OpenMME – MME

In use by at least one commercial operator (in some capacity).

Next Generation Infrastructure Core (S-GW & P-GW)

Seems to only compile on 16.04 and not really

c3po – HSS / CDR / CTF

OpenCORD

srsEPC

(from the guys who produced srsLTE / srsENB / srsUE)

Android Carrier Privileges

So a problem had arisen, carriers wanted to change certain carrier related settings on devices (Specifically the Carrier Config Manager) in the Android ecosystem. The Android maintainers didn’t want to open the permissions to change these settings to everyone, only the carrier providing service to that device.

And if you purchased a phone from Carrier A, and moved to Carrier B, how do you manage the permissions for Carrier B’s app and then restrict Carrier A’s app?

Enter the Android UICC Carrier Privileges.

The carrier loads a certificate onto the SIM Cards, and signing Android Apps with this certificate, allowing the Android OS to verify the certificate on the card and the App are known to each other, and thus the carrier issuing the SIM card also issued the app, and presto, the permissions are granted to the app.

Carriers have full control of the UICC, so this mechanism provides a secure and flexible way to manage apps from the mobile network operator (MNO) hosted on generic app distribution channels (such as Google Play) while retaining special privileges on devices and without the need to sign apps with the per-device platform certificate or preinstall as a system app.

UICC Carrier Privileges doc

Once these permissions are granted your app is able to make API calls related to:

  • APN Settings
  • Roaming/nonroaming networks
  • Visual voicemail
  • SMS/MMS network settings
  • VoLTE/IMS configurations
  • OTA Updating SIM Cards
  • Sending PDUs to the card

Ansible – Timeout on Become

I’ve written a playbook that provisions some server infrastructure, however one of the steps is to change the hostname.

A common headache when changing the hostname on a Linux machine is that if the hostname you set for the machine, isn’t in the machine’s /etc/hosts file, then when you run sudo su or su, it takes a really long time before it shows you the prompt as the machine struggles to do a DNS lookup for it’s own hostname and fails,

This becomes an even bigger problem when you’re using Ansible to setup these machines, Ansible times out when changing the hostname;

Simple fix, edit the /etc/ansible/ansible.cfg file and include

# SSH timeout
timeout = 300

And that’s it.