So I run a lot of VMs. It’s not unusual when I’m automating something with Ansible or setting up a complex lab to be running 20+ VMs at a time, and often I’ll create a base VM and clone it a dozen times.
Alas, Ubuntu 20.04 has some very irritating default behaviour, where even if the MAC addresses of these cloned VMs differ they get the same IP Address from DHCP.
That’s because by default Netplan doesn’t use the MAC address as the identifier when requesting a DHCP lease. And if you’ve cloned a VM the identifier it does use doesn’t change even if you do change the MAC address…
Irritating, but easily fixed!
Editing the netplan config:
network:
ethernets:
eth0:
dhcp4: true
dhcp-identifier: mac
version: 2
Run a netplan-apply and you’re done.
Now you can clone that VM as many times as you like and each will get it’s own unique IP address.
This is part of a series of posts looking into SS7 and Sigtran networks. We cover some basic theory and then get into the weeds with GNS3 based labs where we will build real SS7/Sigtran based networks and use them to carry traffic.
Having a direct Linkset from every Point Code to every other Point Code in an SS7 network isn’t practical, we need to rely on routing, so in this post we’ll cover routing between Point Codes on our STPs.
Let’s start in the IP world, imagine a router with a routing table that looks something like this:
Simple IP Routing Table
192.168.0.0/24 out 192.168.0.1 (Directly Attached)
172.16.8.0/22 via 192.168.0.3 - Static Route - (Priority 100)
172.16.0.0/16 via 192.168.0.2 - Static Route - (Priority 50)
10.98.22.1/32 via 192.168.0.3 - Static Route - (Priority 50)
We have an implicit route for the network we’re directly attached to (192.168.0.0/24), and then a series of static routes we configure. We’ve also got two routes to the 172.16.8.0/22 subnet, one is more specific with a higher priority (172.16.8.0/22 – Priority 100), while the other is less specific with a lower priority (172.16.0.0/16 – Priority 50). The higher priority route will take precedence.
This should look pretty familiar to you, but now we’re going to take a look at routing in SS7, and for that we’re going to be talking Variable Length Subnet Masking in detail you haven’t needed to think about since doing your CCNA years ago…
Why Masking is Important
A route to a single Point Code is called a “/14”, this is akin to a single IPv4 address being called a “/32”.
We could setup all our routing tables with static routes to each point code (/14), but with about 4,000 international point codes, this might be a challenge.
Instead, by using Masks, we can group together ranges of Point Codes and route those ranges through a particular STP.
This opens up the ability to achieve things like “Route all traffic to Point Codes to this Default Gateway STP”, or to say “Route all traffic to this region through this STP”.
Individually routing to a point code works well for small scale networking, but there’s power, flexibility and simplification that comes from grouping together ranges of point codes.
Information Overload about Point Codes
So far we’ve talked about point codes in the X.YYY.Z format, in our lab we setup point codes like 1.2.3.
This is not the only option however…
Variants of SS7 Point Codes
IPv4 addresses look the same regardless of where you are. From Algeria to Zimbabwe, IPv4 addresses look the same and route the same.
In SS7 networks that’s not the case – There are a lot of variants that define how a point code is structured, how long it is, etc. Common variants are ANSI, ITU-T (International & National variants), ETSI, Japan NTT, TTC & China.
The SS7 variant used must match on both ends of a link; this means an SS7 node speaking ETSI flavoured Point Codes can’t exchange messages with an ANSI flavoured Point Code.
Well, you can kinda translate from one variant to another, but requires some rewriting not unlike how NAT does it.
ITU International Variant
For the start of this series, we’ll be working with the ITU International variant / flavour of Point Code.
ITU International point codes are 14 bits long, and format is described as 3-8-3. The 3-8-3 form of Point code just means the 14 bit long point code is broken up into three sections, the first section is made up of the first 3 bits, the second section is made up of the next 8 bits then the remaining 3 bits in the last section, for a total of 14 bits.
So our 14 bit 3-8-3 Point Code looks like this in binary form:
If you’re dealing with multiple vendors or products,you’ll see some SS7 Point Codes represented as decimal (2067), some showing as 1-2-3 codes and sometimes just raw binary. Fun hey?
So why does the binary part matter? Well the answer is for masks.
To loop back to the start of this post, we talked about IP routing using a network address and netmask, to represent a range of IP addresses. We can do the same for SS7 Point Codes, but that requires a teeny bit of working out.
As an example let’s imagine we need to setup a route to all point codes from 3-4-0 through to 3-6-7, without specifying all the individual point codes between them.
Firstly let’s look at our start and end point codes in binary:
100-00000100-000 = 3-004-0 (Start Point Code)
100-00000110-111 = 3-006-7 (End Point Code)
Looking at the above example let’s look at how many bits are common between the two,
100-00000100-000 = 3-004-0 (Start Point Code)
100-00000110-111 = 3-006-7 (End Point Code)
The first 9 bits are common, it’s only the last 5 bits that change, so we can group all these together by saying we have a /9 mask.
When it comes time to add a route, we can add a route to 3-4-0/9 and that tells our STP to match everything from point code 3-4-0 through to point code 3-6-7.
The STP doing the routing it only needs to match on the first 9 bits in the point code, to match this route.
SS7 Routing Tables
Now we have covered Masking for roues, we can start putting some routes into our network.
In order to get a message from one point code to another point code, where there isn’t a direct linkset between the two, we need to rely on routing, which is performed by our STPs.
This is where all that point code mask stuff we just covered comes in.
Let’s look at a diagram below,
Let’s look at the routing to get a message from Exchange A (SSP) on the bottom left of the picture to Exchange E (SSP) with Point Code 4.5.3 in the bottom right of the picture.
Exchange A (SSP) on the bottom left of the picture has point code 1.2.3 assigned to it and a Linkset to STP-A. It has the implicit route to STP-A as it’s got that linkset, but it’s also got a route configured on it to reach any other point code via the Linkset to STP-A via the 0.0.0/0 route which is the SS7 equivalent of a default route. This means any traffic to any point code will go to STP-A.
From STP-A we have a linkset to STP-B. In order to route to the point codes behind STP-B, STP-A has a route to match any Point Code starting with 4.5.X, which is 4.5.0/11. This means that STP-A will route any Point Code between 4.5.1 and 4.5.7 down the Linkset to STP-B.
STP-B has got a direct connection to Exchange B and Exchange E, so has implicit routes to reach each of them.
So with that routing table, Exchange A should be able to route a message to Exchange E.
But…
Return Routing
Just like in IP routing, we need return routing. while Exchange A (SSP) at 1.2.3 has a route to everywhere in the network, the other parts of the network don’t have a route to get to it. This means a request from 1.2.3 can get anywhere in the network, but it can’t get a response back to 1.2.3.
So to get traffic back to Exchange A (SSP) at 1.2.3, our two Exchanges on the right (Exchange B & C with point codes 4.5.6 and 4.5.3) will need routes added to them. We’ll also need to add routes to STP-B, and once we’ve done that, we should be able to get from Exchange A to any point code in this network.
There is a route missing here, see if you can pick up what it is!
So we’ve added a default route via STP-B on Exchange B & Exchange E, and added a route on STP-B to send anything to 1.2.3/14 via STP-A, and with that we should be able to route from any exchange to any other exchange.
One last point on terminology – when we specify a route we don’t talk in terms of the next hop Point Code, but the Linkset to route it down. For example the default route on Exchange A is 0.0.0/0 via STP-A linkset (The linkset from Exchange A to STP-A), we don’t specify the point code of STP-A, but just the name of the Linkset between them.
Back into the Lab
So back to the lab, where we left it was with linksets between each point code, so each Country could talk to it’s neighbor.
Let’s confirm this is the case before we go setting up routes, then together, we’ll get a route from Country A to Country C (and back).
So let’s check the status of the link from Country B to its two neighbors – Country A and Country C. All going well it should look like this, and if it doesn’t, then stop by my last post and check you’ve got everything setup.
So let’s add some routing so Country A can reach Country C via Country B. On Country A STP we’ll need to add a static route. For this example we’ll add a route to 7.7.1/14 (Just Country C).
That means Country A knows how to get to Country C. But with no return routing, Country C doesn’t know how to get to Country A. So let’s fix that.
We’ll add a static route to Country C to send everything via Country B.
CountryC#conf t
Enter configuration commands, one per line. End with CNTL/Z.
CountryC(config)#cs7 route-table system
CountryC(config)#update route 0.0.0/0 linkset ToCountryB
*Jan 01 05:37:28.879: %CS7MTP3-5-DESTSTATUS: Destination 0.0.0 is accessible
So now from Country C, let’s see if we can ping Country A (Ok, it’s not a “real” ICMP ping, it’s a link state check message, but the result is essentially the same).
By running:
CountryC# ping cs7 1.2.3
*Jan 01 06:28:53.699: %CS7PING-6-RTT: Test Q.755 1.2.3: MTP Traffic test rtt 48/48/48
*Jan 01 06:28:53.699: %CS7PING-6-STAT: Test Q.755 1.2.3: MTP Traffic test 100% successful packets(1/1)
*Jan 01 06:28:53.699: %CS7PING-6-RATES: Test Q.755 1.2.3: Receive rate(pps:kbps) 1:0 Sent rate(pps:kbps) 1:0
*Jan 01 06:28:53.699: %CS7PING-6-TERM: Test Q.755 1.2.3: MTP Traffic test terminated.
We can confirm now that Country C can reach Country A, we can do the same from Country A to confirm we can reach Country B.
But what about Country D? The route we added on Country A won’t cover Country D, and to get to Country D, again we go through Country B.
This means we could group Country C and Country D into one route entry on Country A that matches anything starting with 7-X-X,
For this we’d add a route on Country A, and then remove the original route;
Of course, you may have already picked up, we’ll need to add a return route to Country D, so that it has a default route pointing all traffic to STP-B. Once we’ve done that from Country A we should be able to reach all the other countries:
CountryA#show cs7 route
Dynamic Routes 0 of 1000
Routing table = system Destinations = 3 Routes = 3
Destination Prio Linkset Name Route
---------------------- ---- ------------------- -------
4.5.6/14 acces 1 ToCountryB avail
7.0.0/3 acces 5 ToCountryB avail
CountryA#ping cs7 7.8.1
*Jan 01 07:28:19.503: %CS7PING-6-RTT: Test Q.755 7.8.1: MTP Traffic test rtt 84/84/84
*Jan 01 07:28:19.503: %CS7PING-6-STAT: Test Q.755 7.8.1: MTP Traffic test 100% successful packets(1/1)
*Jan 01 07:28:19.503: %CS7PING-6-RATES: Test Q.755 7.8.1: Receive rate(pps:kbps) 1:0 Sent rate(pps:kbps) 1:0
*Jan 01 07:28:19.507: %CS7PING-6-TERM: Test Q.755 7.8.1: MTP Traffic test terminated.
CountryA#ping cs7 7.7.1
*Jan 01 07:28:26.839: %CS7PING-6-RTT: Test Q.755 7.7.1: MTP Traffic test rtt 60/60/60
*Jan 01 07:28:26.839: %CS7PING-6-STAT: Test Q.755 7.7.1: MTP Traffic test 100% successful packets(1/1)
*Jan 01 07:28:26.839: %CS7PING-6-RATES: Test Q.755 7.7.1: Receive rate(pps:kbps) 1:0 Sent rate(pps:kbps) 1:0
*Jan 01 07:28:26.843: %CS7PING-6-TERM: Test Q.755 7.7.1: MTP Traffic test terminated.
So where to from here?
Well, we now have a a functional SS7 network made up of STPs, with routing between them, but if we go back to our SS7 network overview diagram from before, you’ll notice there’s something missing from our lab network…
So far our network is made up only of STPs, that’s like building a network only out of routers!
In our next lab, we’ll start adding some SSPs to actually generate some SS7 traffic on the network, rather than just OAM traffic.
Chances are if you’re reading this, you’re trying to work out what Telephony Binary-Coded Decimal encoding is. I got you.
Again I found myself staring at encoding trying to guess how it worked, reading references that looped into other references, in this case I was encoding MSISDN AVPs in Diameter.
How to Encode a number using Telephony Binary-Coded Decimal encoding?
First, Group all the numbers into pairs, and reverse each pair.
So a phone number of 123456, becomes:
214365
Because 1 & 2 are swapped to become 21, 3 & 4 are swapped to become 34, 5 & 6 become 65, that’s how we get that result.
TBCD Encoding of numbers with an Odd Length?
If we’ve got an odd-number of digits, we add an F on the end and still flip the digits,
For example 789, we add the F to the end to pad it to an even length, and then flip each pair of digits, so it becomes:
87F9
That’s the abbreviated version of it. If you’re only encoding numbers that’s all you’ll need to know.
Detail Overload
Because the numbers 0-9 can be encoded using only 4 bits, the need for a whole 8 bit byte to store this information is considered excessive.
For example 1 represented as a binary 8-bit byte would be 00000001, while 9 would be 00001001, so even with our largest number, the first 4 bits would always going to be 0000 – we’d only use half the available space.
So TBCD encoding stores two numbers in each Byte (1 number in the first 4 bits, one number in the second 4 bits).
To go back to our previous example, 1 represented as a binary 4-bit word would be 0001, while 9 would be 1001. These are then swapped and concatenated, so the number 19 becomes 1001 0001 which is hex 0x91.
Let’s do another example, 82, so 8 represented as a 4-bit word is 1000 and 2 as a 4-bit word is 0010. We then swap the order and concatenate to get 00101000 which is hex 0x28 from our inputted 82.
Final example will be a 3 digit number, 123. As we saw earlier we’ll add an F to the end for padding, and then encode as we would any other number,
F is encoded as 1111.
1 becomes 0001, 2 becomes 0010, 3 becomes 0011 and F becomes 1111. Reverse each pair and concatenate 00100001 11110011 or hex 0x21 0xF3.
Special Symbols (#, * and friends)
Because TBCD Encoding was designed for use in Telephony networks, the # and * symbols are also present, as they are on a telephone keypad.
Astute readers may have noticed that so far we’ve covered 0-9 and F, which still doesn’t use all the available space in the 4 bit area.
The extended DTMF keys of A, B & C are also valid in TBCD (The D key was sacrificed to get the F in).
Symbol
4 Bit Word
*
1 0 1 0
#
1 0 1 1
a
1 1 0 0
b
1 1 0 1
c
1 1 1 0
So let’s run through some more examples,
*21 is an odd length, so we’ll slap an F on the end (*21F), and then encoded each pair of values into bytes, so * becomes 1010, 2 becomes 0010. Swap them and concatenate for our first byte of 00101010 (Hex 0x2A). F our second byte 1F, 1 becomes 0001 and F becomes 1111. Swap and concatenate to get 11110001 (Hex 0xF1). So *21 becomes 0x2A 0xF1.
And as promised, some Python code from PyHSS that does it for you:
def TBCD_special_chars(self, input):
if input == "*":
return "1010"
elif input == "#":
return "1011"
elif input == "a":
return "1100"
elif input == "b":
return "1101"
elif input == "c":
return "1100"
else:
print("input " + str(input) + " is not a special char, converting to bin ")
return ("{:04b}".format(int(input)))
def TBCD_encode(self, input):
print("TBCD_encode input value is " + str(input))
offset = 0
output = ''
matches = ['*', '#', 'a', 'b', 'c']
while offset < len(input):
if len(input[offset:offset+2]) == 2:
bit = input[offset:offset+2] #Get two digits at a time
bit = bit[::-1] #Reverse them
#Check if *, #, a, b or c
if any(x in bit for x in matches):
new_bit = ''
new_bit = new_bit + str(TBCD_special_chars(bit[0]))
new_bit = new_bit + str(TBCD_special_chars(bit[1]))
bit = str(int(new_bit, 2))
output = output + bit
offset = offset + 2
else:
bit = "f" + str(input[offset:offset+2])
output = output + bit
print("TBCD_encode output value is " + str(output))
return output
def TBCD_decode(self, input):
print("TBCD_decode Input value is " + str(input))
offset = 0
output = ''
while offset < len(input):
if "f" not in input[offset:offset+2]:
bit = input[offset:offset+2] #Get two digits at a time
bit = bit[::-1] #Reverse them
output = output + bit
offset = offset + 2
else: #If f in bit strip it
bit = input[offset:offset+2]
output = output + bit[1]
print("TBCD_decode output value is " + str(output))
return output
Just discovered you can add VLANs to Realtek NICs on Windows PCs,
I have a fairly grunty desktop I use for running anything that needs Windows, running VMware Workstation and occasional gaming,
I do have a big Dell machine running ESXi which supports VLAN tagging and trunking, but I try and avoid using it as it’s deafeningly loud and very power hungry.
Recently as the lab network I use grows and grows I’ve been struggling to run all the VMs running in Workstation as I’ve been running out of IP space and wanting some more separation between networks.
Now I can add VLANs onto the existing NIC using the Realtek Ethernet Diagnostic Utility, and then bridge each of these NICs to the respective VM in Workstation, and the port to the Mikrotik CRS is now a trunk with all the VLANs on it.
As Open5Gs has introduced network slicing, which led to a change in the database used,
Alas many users had subscribers provisioned in the old DB schema and no way to migrate the SDM data between the old and new schema,
If you’ve created subscribers on the old schema, and now after the updates your Subscriber Authentication is failing, check out this tool I put together, to migrate your data over.
I’d been trying for some time to get Kamailio acting as a Diameter Routing Agent with mixed success, and eventually got it working, after a few changes to the codebase of the ims_diameter_server module.
It is rather unstable, in that if it fails to dispatch to a Diameter peer, the whole thing comes crumbling down, but incoming Diameter traffic is proxied off to another Diameter peer, and Kamailio even adds an extra AVP.
Having used Kamailio for so long I was really hoping I could work with Kamailio as a DRA as easily as I do for SIP traffic, but it seems the Diameter module still needs a lot more love before it’ll be stable enough and simple enough for everyone to use.
I created a branch containing the fixes I made to make it work, and with an example config for use, but use with caution. It’s a long way from being production-ready, but hopefully in time will evolve.
Give me a lever long enough and a fulcrum on which to place it, and I shall move the world.
For me, the equivalent would be:
Give me a packet capture of the problem occurring and a standards document against which to compare it, and I shall debug the networking world.
And if you’re like me, there’s a good chance when things are really not going your way, you roll up your sleeves, break out Wireshark and the standards docs and begin the painstaking process of trying to work out what’s not right.
Today’s problem involved a side by side comparison between a pcap of a known good call, and one which is failing, so I just had to compare the two, which is slow and fairly error-prone,
So I started looking for something to diff PCAPs easily. The data I was working with was ASN.1 encoded so I couldn’t export as text like you can with HTTP or SIP based protocols and compare it that way.
In the end I stumbled across something even better to compare frames from packet captures side by side, with the decoding intact!
Turns out yo ucan copy the values including decoding from within Wireshark, which means you can then just paste the contents into a diff tool (I’m using the fabulous Meld on Linux, but any diff tool will do including diff itself) and off you go, side-by-side comparison.
Select the first packet/frame you’re interested in (or even just the section), expand the subkeys, right click, copy “All Visible items”. This copy contains all the decoded data, not just the raw bytes, which is what makes it so great.
Next paste it into your diff tool of choice, repeat with the one to compare against, scroll past the data you know is going to be different (session IDs, IPs, etc) and ta-da, there’s the differences.
One feature I’m pretty excited to share is the addition of a single config file for defining how PyHSS functions,
In the past you’d set variables in the code or comment out sections to change behaviour, which, let’s face it – isn’t great.
Instead the config.yaml file defines the PLMN, transport time (TCP or SCTP), the origin host and realm.
We can also set the logging parameters, SNMP info and the database backend to be used,
HSS Parameters
hss:
transport: "SCTP"
#IP Addresses to bind on (List) - For TCP only the first IP is used, for SCTP all used for Transport (Multihomed).
bind_ip: ["10.0.1.252"]
#Port to listen on (Same for TCP & SCTP)
bind_port: 3868
#Value to populate as the OriginHost in Diameter responses
OriginHost: "hss.localdomain"
#Value to populate as the OriginRealm in Diameter responses
OriginRealm: "localdomain"
#Value to populate as the Product name in Diameter responses
ProductName: "pyHSS"
#Your Home Mobile Country Code (Used for PLMN calcluation)
MCC: "999"
#Your Home Mobile Network Code (Used for PLMN calcluation)
MNC: "99"
#Enable GMLC / SLh Interface
SLh_enabled: True
logging:
level: DEBUG
logfiles:
hss_logging_file: log/hss.log
diameter_logging_file: log/diameter.log
database_logging_file: log/db.log
log_to_terminal: true
database:
mongodb:
mongodb_server: 127.0.0.1
mongodb_username: root
mongodb_password: password
mongodb_port: 27017
Stats Parameters
redis:
enabled: True
clear_stats_on_boot: False
host: localhost
port: 6379
snmp:
port: 1161
listen_address: 127.0.0.1
I’d tried in the past to use the USB port on the Mikrotik, an external HDD and the SMB server in RouterOS, to act as a simple NAS for sharing files on the home network. And the performance was terrible.
This is because the device is a Router. Not a NAS (duh). And everything I later read online confirmed that yes, this is a router, not a NAS so stop trying.
But I recently got a new Mikrotik CRS109, so now I have a new Mikrotik, how bad is the SMB file share performance?
To test this I’ve got a USB drive with some files on it, an old Mikrotik RB915G and the new Mikrotik CRS109-8G-1S-2HnD-IN, and compared the time it takes to download a file between the two.
Mikrotik Routerboard RB951G
While pulling a 2Gb file of random data from a FAT formatted flash drive, I achieved a consistent 1.9MB/s (15.2 Mb/s)
nick@oldfaithful:~$ smbget smb://10.0.1.1/share1/2Gb_file.bin
Password for [nick] connecting to //share1/10.0.1.1:
Using workgroup WORKGROUP, user nick
smb://10.0.1.1/share1/2Gb_file.bin
Downloaded 2.07GB in 1123 seconds
Mikrotik CRS109
In terms of transfer speed, a consistent 2.8MB/s (22.4 Mb/s)
nick@oldfaithful:~$ smbget smb://10.0.1.1/share1/2Gb_file.bin
Password for [nick] connecting to //share1/10.0.1.1:
Using workgroup WORKGROUP, user nick
smb://10.0.1.1/share1/2Gb_file.bin
Downloaded 2.07GB in 736 seconds
Profiler shows 25% CPU usage on “other”, which drops down as soon as the file transfers stop,
So better, but still not a NAS (duh).
The Verdict
Still not a NAS. So stop trying to use it as a NAS.
As my download speed is faster than 22Mbps I’d just be better to use cloud storage.
I’ve been adding SNMP support to an open source project I’ve been working on (PyHSS) to generate metrics / performance statistics from it, and this meant staring down SNMP again, but this time I’ve come up with a novel way to handle SNMP, that made it much less painful that normal.
The requirement was simple enough, I already had a piece of software I’d written in Python, but I had a need to add an SNMP server to get information about that bit of software.
For a little more detail – PyHSS handles Device Watchdog Requests already, but I needed a count of how many it had handled, made accessible via SNMP. So inside the logic that does this I just increment a counter in Redis;
In the code example above I just add 1 (increment) the Redis key ‘Answer_280_attempt_count’.
The beauty is that that this required minimal changes to the rest of my code – I just sprinkled in these statements to increment Redis keys throughout my code.
Now when that existing function is run, the Redis key “Answer_280_attempt_count” is incremented.
So I ran my software and the function I just added the increment to was called a few times, so I jumped into redis-cli to check on the values;
And just like that we’ve done all the heavy lifting to add SNMP to our software.
For anything else we want counters on, add the increment to your code to store a counter in Redis with that information.
So next up we need to expose our Redis keys via SNMP,
Then when you run it, presto, you’re exposing that data via SNMP.
You can verify it through SNMP walk or start integrating it into your NMS, in the above example OID 1.3.6.1.2.1.1.1.0.2, contains the value of Answer_280_attempt_count from Redis, and with that, you’re exposing info via SNMP, all while not really having to think about SNMP.
*Ok, you still have to sort which OIDs you assign for what, but you get the idea.
It’s 2021, and everyone loves Containers; Docker & Kubernetes are changing how software is developed, deployed and scaled.
And yet so much of the Telco world still uses bare metal servers and dedicated hardware for processing.
So why not use Containers or VMs more for VoIP applications?
Disclaimer – When I’m talking VoIP about VoIP I mean the actual Voice over IP, that’s the Media Stream, RTP, the Audio, etc, not the Signaling (SIP). SIP is fine with Containers, it’s the media that has a bad time and that this post focuses on,
Virtualization Fundamentals
Once upon a time in Development land every application ran on it’s own server running in a DC / Central Office.
This was expensive to deploy (buying servers), operate (lots of power used) and maintain (lots of hardware to keep online).
Each server was actually sitting idle for a large part of the time, with the application running on it only using a some of the available resources some of the time.
One day Virtualization came and suddenly 10 physical servers could be virtualized into 10 VMs.
These VMs still need to run on servers but as each VM isn’t using 100% of it’s allocated resources all the time, instead of needing 10 servers to run it on you could run it on say 3 servers, and even do clever things like migrate VMs between servers if one were to fail.
VMs share the resources of the server it’s running on.
A server running VMs (Hypervisor) is able to run multiple VMs by splitting the resources between VMs.
If a VM A wants to run an operation at the same time a VM B & VM C, the operations can’t be run on each VM at the same time* so the hypervisor will queue up the requests and schedule them in, typically based on first-in-first out or based on a resource priority policy on the Hypervisor.
This is fine for a if VM A, B & C were all Web Servers. A request coming into each of them at the same time would see the VM the Hypervisor schedules the resources to respond to the request slightly faster, with the other VMs responding to the request when the hypervisor has scheduled the resources to the respective VM.
VoIP is an only child
VoIP has grown up on dedicated hardware. It’s an only child that does not know how to share, because it’s never had to.
Having to wait for resources to be scheduled by the Hypervisor to to VM in order for it to execute an operation is fine and almost unnoticeable for web servers, it can have some pretty big impacts on call quality.
If we’re running RTPproxy or RTPengine in order to relay media, scheduling delays can mean that the media stream ends up “bursty”.
RTP packets needing relaying are queued in the buffer on the VM and only relayed when the hypervisor is able to schedule resources, this means there can be a lot of packet-delay-variation (PDV) and increased latency for services running on VMs.
VMs and Containers both have this same fate, DPDK and SR-IOV assist in throughput, but they don’t stop interrupt headaches.
VMs that deprive other VMs on the same host of resources are known as “Noisy neighbors”.
The simple fix for all these problems? There isn’t one.
Each of these issues can be overcome, dedicating resources, to a specific VM or container, cleverly distributing load, but it is costly in terms of resources and time to tweak and implement, and some of these options undermine the value of virtualization or containerization.
As technology marches forward we have scenarios where Kubernetes can expose FPGA resources to pass them through to Pods, but right now, if you need to transcode more than ~100 calls efficiently, you’re going to need a hardware device.
And while it can be done by throwing more x86 / ARM compute resources at the problem, hardware still wins out as cheaper in most instances.
Australia is a strange country; As a kid I was scared of dogs, and in response, our family got a dog.
This year started off with adventures working with ASN.1 encoded data, and after a week of banging my head against the table, I was scared of ASN.1 encoding.
But now I love dogs, and slowly, I’m learning to embrace ASN.1 encoding.
What is ASN.1?
ASN.1 is an encoding scheme.
The best analogy I can give is to image a sheet of paper with a form on it, the form has fields for all the different bits of data it needs,
Each of the fields on the form has a data type, and the box is sized to restrict input, and some fields are mandatory.
Now imagine you take this form and cut a hole where each of the text boxes would be.
We’ve made a key that can be laid on top of a blank sheet of paper, then we can fill the details through the key onto the blank paper and reuse the key over and over again to fill the data out many times.
When we remove the key off the top of our paper, and what we have left on the paper below is the data from the form. Without the key on top this data doesn’t make much sense, but we can always add the key back and presto it’s back to making sense.
While this may seem kind of pointless let’s look at the advantages of this method;
The data is validated by the key – People can’t put a name wherever, and country code anywhere, it’s got to be structured as per our requirements. And if we tried to enter a birthday through the key form onto the paper below, we couldn’t.
The data is as small as can be – Without all the metadata on the key above, such as the name of the field, the paper below contains only the pertinent information, and if a field is left blank it doesn’t take up any space at all on the paper.
It’s these two things, rigidly defined data structures (no room for errors or misinterpretation) and the minimal size on the wire (saves bandwidth), that led to 3GPP selecting ASN.1 encoding for many of it’s protocols, such as S1, NAS, SBc, X2, etc.
It’s also these two things that make ASN.1 kind of a jerk; If the data structure you’re feeding into your ASN.1 compiler does not match it will flat-out refuse to compile, and there’s no way to make sense of the data in its raw form.
But working with a super simple ASN.1 definition you’ve created is one thing, using the 3GPP defined ASN.1 definitions is another,
With the aid of the fantastic PyCrate library, which is where the real magic happens, and this was the nut I cracked this week, compiling a 3GPP ASN.1 definition and communicating a standards-based protocol with it.
So dedicated appliances are dead and all our network functions are VMs or Containers, but there’s a performance hit when going virtual as the L2 processing has to be handled by the Hypervisor before being passed onto the relevant VM / Container.
If we have a 10Gb NIC in our server, we want to achieve a 10Gbps “Line Speed” on the Network Element / VNF we’re running on.
When we talked about appliances if you purchased an P-GW with 10Gbps NIC, it was a given you could get 10Gbps through it (without DPI, etc), but when we talk about virtualized network functions / network elements there’s a very real chance you won’t achieve the “line speed” of your interfaces without some help.
When you’ve got a Network Element like a S-GW, P-GW or UPF, you want to forward packets as quickly as possible – bottlenecks here would impact the user’s achievable speeds on the network.
To speed things up there are two technologies, that if supported by your software stack and hardware, allows you to significantly increase throughput on network interfaces, DPDK & SR-IOV.
DPDK – Data Plane Development Kit
Usually *Nix OSs handle packet processing on the Kernel level. As I type this the packets being sent to this WordPress server by Firefox are being handled by the Linux 5.8.0-36-generic kernel running on my machine.
The problem is the kernel has other things to do (interrupts), meaning increased delay in processing (due to waiting for processing capability) and decreased capacity.
DPDK shunts this processing to the “user space” meaning your application (the actual magic of the VNF / Network Element) controls it.
To go back to me writing this – If Firefox and my laptop supported DPDK, then the packets wouldn’t traverse the Linux kernel at all, and Firefox would be talking directly to my NIC. (Obviously this isn’t the case…)
So DPDK increases network performance by shifting the processing of packets to the application, bypassing the kernel altogether. You are still limited by the CPU and Memory available, but with enough of each you should reach very near to line speed.
SR-IOV – Single Root Input Output Virtualization
Going back to the me writing this analogy I’m running Linux on my laptop, but let’s imagine I’m running a VM running Firefox under Linux to write this.
If that’s the case then we have an even more convolted packet processing chain!
I type the post into Firefox which sends the packets to the Linux kernel, which waits to be scheduled resources by the hypervisor, which then process the packets in the hypervisor kernel before finally making it onto the NIC.
We could add DPDK which skips some of these steps, but we’d still have the bottleneck of the hypervisor.
With PCIe passthrough we could pass the NIC directly to the VM running the Firefox browser window I’m typing this, but then we have a problem, no other VMs can access these resources.
SR-IOV provides an interface to passthrough PCIe to VMs by slicing the PCIe interface up and then passing it through.
My VM would be able to access the PCIe side of the NIC, but so would other VMs.
So that’s the short of it, SR-IOR and DPDK enable better packet forwarding speeds on VNFs.
So a problem had arisen, carriers wanted to change certain carrier related settings on devices (Specifically the Carrier Config Manager) in the Android ecosystem. The Android maintainers didn’t want to open the permissions to change these settings to everyone, only the carrier providing service to that device.
And if you purchased a phone from Carrier A, and moved to Carrier B, how do you manage the permissions for Carrier B’s app and then restrict Carrier A’s app?
The carrier loads a certificate onto the SIM Cards, and signing Android Apps with this certificate, allowing the Android OS to verify the certificate on the card and the App are known to each other, and thus the carrier issuing the SIM card also issued the app, and presto, the permissions are granted to the app.
Carriers have full control of the UICC, so this mechanism provides a secure and flexible way to manage apps from the mobile network operator (MNO) hosted on generic app distribution channels (such as Google Play) while retaining special privileges on devices and without the need to sign apps with the per-device platform certificate or preinstall as a system app.
I’ve written a playbook that provisions some server infrastructure, however one of the steps is to change the hostname.
A common headache when changing the hostname on a Linux machine is that if the hostname you set for the machine, isn’t in the machine’s /etc/hosts file, then when you run sudo su or su, it takes a really long time before it shows you the prompt as the machine struggles to do a DNS lookup for it’s own hostname and fails,
This becomes an even bigger problem when you’re using Ansible to setup these machines, Ansible times out when changing the hostname;
Simple fix, edit the /etc/ansible/ansible.cfg file and include
# SSH timeout
timeout = 300
And that’s it.
Want more telecom goodness?
I have a good old fashioned RSS feed you can subscribe to.