Just discovered you can add VLANs to Realtek NICs on Windows PCs,
I have a fairly grunty desktop I use for running anything that needs Windows, running VMware Workstation and occasional gaming,
I do have a big Dell machine running ESXi which supports VLAN tagging and trunking, but I try and avoid using it as it’s deafeningly loud and very power hungry.
Recently as the lab network I use grows and grows I’ve been struggling to run all the VMs running in Workstation as I’ve been running out of IP space and wanting some more separation between networks.
Now I can add VLANs onto the existing NIC using the Realtek Ethernet Diagnostic Utility, and then bridge each of these NICs to the respective VM in Workstation, and the port to the Mikrotik CRS is now a trunk with all the VLANs on it.
As Open5Gs has introduced network slicing, which led to a change in the database used,
Alas many users had subscribers provisioned in the old DB schema and no way to migrate the SDM data between the old and new schema,
If you’ve created subscribers on the old schema, and now after the updates your Subscriber Authentication is failing, check out this tool I put together, to migrate your data over.
I’d been trying for some time to get Kamailio acting as a Diameter Routing Agent with mixed success, and eventually got it working, after a few changes to the codebase of the ims_diameter_server module.
It is rather unstable, in that if it fails to dispatch to a Diameter peer, the whole thing comes crumbling down, but incoming Diameter traffic is proxied off to another Diameter peer, and Kamailio even adds an extra AVP.
Having used Kamailio for so long I was really hoping I could work with Kamailio as a DRA as easily as I do for SIP traffic, but it seems the Diameter module still needs a lot more love before it’ll be stable enough and simple enough for everyone to use.
I created a branch containing the fixes I made to make it work, and with an example config for use, but use with caution. It’s a long way from being production-ready, but hopefully in time will evolve.
Give me a lever long enough and a fulcrum on which to place it, and I shall move the world.
For me, the equivalent would be:
Give me a packet capture of the problem occurring and a standards document against which to compare it, and I shall debug the networking world.
And if you’re like me, there’s a good chance when things are really not going your way, you roll up your sleeves, break out Wireshark and the standards docs and begin the painstaking process of trying to work out what’s not right.
Today’s problem involved a side by side comparison between a pcap of a known good call, and one which is failing, so I just had to compare the two, which is slow and fairly error-prone,
So I started looking for something to diff PCAPs easily. The data I was working with was ASN.1 encoded so I couldn’t export as text like you can with HTTP or SIP based protocols and compare it that way.
In the end I stumbled across something even better to compare frames from packet captures side by side, with the decoding intact!
Turns out yo ucan copy the values including decoding from within Wireshark, which means you can then just paste the contents into a diff tool (I’m using the fabulous Meld on Linux, but any diff tool will do including diff itself) and off you go, side-by-side comparison.
Select the first packet/frame you’re interested in (or even just the section), expand the subkeys, right click, copy “All Visible items”. This copy contains all the decoded data, not just the raw bytes, which is what makes it so great.
Next paste it into your diff tool of choice, repeat with the one to compare against, scroll past the data you know is going to be different (session IDs, IPs, etc) and ta-da, there’s the differences.
One feature I’m pretty excited to share is the addition of a single config file for defining how PyHSS functions,
In the past you’d set variables in the code or comment out sections to change behaviour, which, let’s face it – isn’t great.
Instead the config.yaml file defines the PLMN, transport time (TCP or SCTP), the origin host and realm.
We can also set the logging parameters, SNMP info and the database backend to be used,
HSS Parameters
hss:
transport: "SCTP"
#IP Addresses to bind on (List) - For TCP only the first IP is used, for SCTP all used for Transport (Multihomed).
bind_ip: ["10.0.1.252"]
#Port to listen on (Same for TCP & SCTP)
bind_port: 3868
#Value to populate as the OriginHost in Diameter responses
OriginHost: "hss.localdomain"
#Value to populate as the OriginRealm in Diameter responses
OriginRealm: "localdomain"
#Value to populate as the Product name in Diameter responses
ProductName: "pyHSS"
#Your Home Mobile Country Code (Used for PLMN calcluation)
MCC: "999"
#Your Home Mobile Network Code (Used for PLMN calcluation)
MNC: "99"
#Enable GMLC / SLh Interface
SLh_enabled: True
logging:
level: DEBUG
logfiles:
hss_logging_file: log/hss.log
diameter_logging_file: log/diameter.log
database_logging_file: log/db.log
log_to_terminal: true
database:
mongodb:
mongodb_server: 127.0.0.1
mongodb_username: root
mongodb_password: password
mongodb_port: 27017
Stats Parameters
redis:
enabled: True
clear_stats_on_boot: False
host: localhost
port: 6379
snmp:
port: 1161
listen_address: 127.0.0.1
I’d tried in the past to use the USB port on the Mikrotik, an external HDD and the SMB server in RouterOS, to act as a simple NAS for sharing files on the home network. And the performance was terrible.
This is because the device is a Router. Not a NAS (duh). And everything I later read online confirmed that yes, this is a router, not a NAS so stop trying.
But I recently got a new Mikrotik CRS109, so now I have a new Mikrotik, how bad is the SMB file share performance?
To test this I’ve got a USB drive with some files on it, an old Mikrotik RB915G and the new Mikrotik CRS109-8G-1S-2HnD-IN, and compared the time it takes to download a file between the two.
Mikrotik Routerboard RB951G
While pulling a 2Gb file of random data from a FAT formatted flash drive, I achieved a consistent 1.9MB/s (15.2 Mb/s)
nick@oldfaithful:~$ smbget smb://10.0.1.1/share1/2Gb_file.bin
Password for [nick] connecting to //share1/10.0.1.1:
Using workgroup WORKGROUP, user nick
smb://10.0.1.1/share1/2Gb_file.bin
Downloaded 2.07GB in 1123 seconds
Mikrotik CRS109
In terms of transfer speed, a consistent 2.8MB/s (22.4 Mb/s)
nick@oldfaithful:~$ smbget smb://10.0.1.1/share1/2Gb_file.bin
Password for [nick] connecting to //share1/10.0.1.1:
Using workgroup WORKGROUP, user nick
smb://10.0.1.1/share1/2Gb_file.bin
Downloaded 2.07GB in 736 seconds
Profiler shows 25% CPU usage on “other”, which drops down as soon as the file transfers stop,
So better, but still not a NAS (duh).
The Verdict
Still not a NAS. So stop trying to use it as a NAS.
As my download speed is faster than 22Mbps I’d just be better to use cloud storage.
I’ve been adding SNMP support to an open source project I’ve been working on (PyHSS) to generate metrics / performance statistics from it, and this meant staring down SNMP again, but this time I’ve come up with a novel way to handle SNMP, that made it much less painful that normal.
The requirement was simple enough, I already had a piece of software I’d written in Python, but I had a need to add an SNMP server to get information about that bit of software.
For a little more detail – PyHSS handles Device Watchdog Requests already, but I needed a count of how many it had handled, made accessible via SNMP. So inside the logic that does this I just increment a counter in Redis;
In the code example above I just add 1 (increment) the Redis key ‘Answer_280_attempt_count’.
The beauty is that that this required minimal changes to the rest of my code – I just sprinkled in these statements to increment Redis keys throughout my code.
Now when that existing function is run, the Redis key “Answer_280_attempt_count” is incremented.
So I ran my software and the function I just added the increment to was called a few times, so I jumped into redis-cli to check on the values;
And just like that we’ve done all the heavy lifting to add SNMP to our software.
For anything else we want counters on, add the increment to your code to store a counter in Redis with that information.
So next up we need to expose our Redis keys via SNMP,
Then when you run it, presto, you’re exposing that data via SNMP.
You can verify it through SNMP walk or start integrating it into your NMS, in the above example OID 1.3.6.1.2.1.1.1.0.2, contains the value of Answer_280_attempt_count from Redis, and with that, you’re exposing info via SNMP, all while not really having to think about SNMP.
*Ok, you still have to sort which OIDs you assign for what, but you get the idea.
It’s 2021, and everyone loves Containers; Docker & Kubernetes are changing how software is developed, deployed and scaled.
And yet so much of the Telco world still uses bare metal servers and dedicated hardware for processing.
So why not use Containers or VMs more for VoIP applications?
Disclaimer – When I’m talking VoIP about VoIP I mean the actual Voice over IP, that’s the Media Stream, RTP, the Audio, etc, not the Signaling (SIP). SIP is fine with Containers, it’s the media that has a bad time and that this post focuses on,
Virtualization Fundamentals
Once upon a time in Development land every application ran on it’s own server running in a DC / Central Office.
This was expensive to deploy (buying servers), operate (lots of power used) and maintain (lots of hardware to keep online).
Each server was actually sitting idle for a large part of the time, with the application running on it only using a some of the available resources some of the time.
One day Virtualization came and suddenly 10 physical servers could be virtualized into 10 VMs.
These VMs still need to run on servers but as each VM isn’t using 100% of it’s allocated resources all the time, instead of needing 10 servers to run it on you could run it on say 3 servers, and even do clever things like migrate VMs between servers if one were to fail.
VMs share the resources of the server it’s running on.
A server running VMs (Hypervisor) is able to run multiple VMs by splitting the resources between VMs.
If a VM A wants to run an operation at the same time a VM B & VM C, the operations can’t be run on each VM at the same time* so the hypervisor will queue up the requests and schedule them in, typically based on first-in-first out or based on a resource priority policy on the Hypervisor.
This is fine for a if VM A, B & C were all Web Servers. A request coming into each of them at the same time would see the VM the Hypervisor schedules the resources to respond to the request slightly faster, with the other VMs responding to the request when the hypervisor has scheduled the resources to the respective VM.
VoIP is an only child
VoIP has grown up on dedicated hardware. It’s an only child that does not know how to share, because it’s never had to.
Having to wait for resources to be scheduled by the Hypervisor to to VM in order for it to execute an operation is fine and almost unnoticeable for web servers, it can have some pretty big impacts on call quality.
If we’re running RTPproxy or RTPengine in order to relay media, scheduling delays can mean that the media stream ends up “bursty”.
RTP packets needing relaying are queued in the buffer on the VM and only relayed when the hypervisor is able to schedule resources, this means there can be a lot of packet-delay-variation (PDV) and increased latency for services running on VMs.
VMs and Containers both have this same fate, DPDK and SR-IOV assist in throughput, but they don’t stop interrupt headaches.
VMs that deprive other VMs on the same host of resources are known as “Noisy neighbors”.
The simple fix for all these problems? There isn’t one.
Each of these issues can be overcome, dedicating resources, to a specific VM or container, cleverly distributing load, but it is costly in terms of resources and time to tweak and implement, and some of these options undermine the value of virtualization or containerization.
As technology marches forward we have scenarios where Kubernetes can expose FPGA resources to pass them through to Pods, but right now, if you need to transcode more than ~100 calls efficiently, you’re going to need a hardware device.
And while it can be done by throwing more x86 / ARM compute resources at the problem, hardware still wins out as cheaper in most instances.
Australia is a strange country; As a kid I was scared of dogs, and in response, our family got a dog.
This year started off with adventures working with ASN.1 encoded data, and after a week of banging my head against the table, I was scared of ASN.1 encoding.
But now I love dogs, and slowly, I’m learning to embrace ASN.1 encoding.
What is ASN.1?
ASN.1 is an encoding scheme.
The best analogy I can give is to image a sheet of paper with a form on it, the form has fields for all the different bits of data it needs,
Each of the fields on the form has a data type, and the box is sized to restrict input, and some fields are mandatory.
Now imagine you take this form and cut a hole where each of the text boxes would be.
We’ve made a key that can be laid on top of a blank sheet of paper, then we can fill the details through the key onto the blank paper and reuse the key over and over again to fill the data out many times.
When we remove the key off the top of our paper, and what we have left on the paper below is the data from the form. Without the key on top this data doesn’t make much sense, but we can always add the key back and presto it’s back to making sense.
While this may seem kind of pointless let’s look at the advantages of this method;
The data is validated by the key – People can’t put a name wherever, and country code anywhere, it’s got to be structured as per our requirements. And if we tried to enter a birthday through the key form onto the paper below, we couldn’t.
The data is as small as can be – Without all the metadata on the key above, such as the name of the field, the paper below contains only the pertinent information, and if a field is left blank it doesn’t take up any space at all on the paper.
It’s these two things, rigidly defined data structures (no room for errors or misinterpretation) and the minimal size on the wire (saves bandwidth), that led to 3GPP selecting ASN.1 encoding for many of it’s protocols, such as S1, NAS, SBc, X2, etc.
It’s also these two things that make ASN.1 kind of a jerk; If the data structure you’re feeding into your ASN.1 compiler does not match it will flat-out refuse to compile, and there’s no way to make sense of the data in its raw form.
But working with a super simple ASN.1 definition you’ve created is one thing, using the 3GPP defined ASN.1 definitions is another,
With the aid of the fantastic PyCrate library, which is where the real magic happens, and this was the nut I cracked this week, compiling a 3GPP ASN.1 definition and communicating a standards-based protocol with it.
So dedicated appliances are dead and all our network functions are VMs or Containers, but there’s a performance hit when going virtual as the L2 processing has to be handled by the Hypervisor before being passed onto the relevant VM / Container.
If we have a 10Gb NIC in our server, we want to achieve a 10Gbps “Line Speed” on the Network Element / VNF we’re running on.
When we talked about appliances if you purchased an P-GW with 10Gbps NIC, it was a given you could get 10Gbps through it (without DPI, etc), but when we talk about virtualized network functions / network elements there’s a very real chance you won’t achieve the “line speed” of your interfaces without some help.
When you’ve got a Network Element like a S-GW, P-GW or UPF, you want to forward packets as quickly as possible – bottlenecks here would impact the user’s achievable speeds on the network.
To speed things up there are two technologies, that if supported by your software stack and hardware, allows you to significantly increase throughput on network interfaces, DPDK & SR-IOV.
DPDK – Data Plane Development Kit
Usually *Nix OSs handle packet processing on the Kernel level. As I type this the packets being sent to this WordPress server by Firefox are being handled by the Linux 5.8.0-36-generic kernel running on my machine.
The problem is the kernel has other things to do (interrupts), meaning increased delay in processing (due to waiting for processing capability) and decreased capacity.
DPDK shunts this processing to the “user space” meaning your application (the actual magic of the VNF / Network Element) controls it.
To go back to me writing this – If Firefox and my laptop supported DPDK, then the packets wouldn’t traverse the Linux kernel at all, and Firefox would be talking directly to my NIC. (Obviously this isn’t the case…)
So DPDK increases network performance by shifting the processing of packets to the application, bypassing the kernel altogether. You are still limited by the CPU and Memory available, but with enough of each you should reach very near to line speed.
SR-IOV – Single Root Input Output Virtualization
Going back to the me writing this analogy I’m running Linux on my laptop, but let’s imagine I’m running a VM running Firefox under Linux to write this.
If that’s the case then we have an even more convolted packet processing chain!
I type the post into Firefox which sends the packets to the Linux kernel, which waits to be scheduled resources by the hypervisor, which then process the packets in the hypervisor kernel before finally making it onto the NIC.
We could add DPDK which skips some of these steps, but we’d still have the bottleneck of the hypervisor.
With PCIe passthrough we could pass the NIC directly to the VM running the Firefox browser window I’m typing this, but then we have a problem, no other VMs can access these resources.
SR-IOV provides an interface to passthrough PCIe to VMs by slicing the PCIe interface up and then passing it through.
My VM would be able to access the PCIe side of the NIC, but so would other VMs.
So that’s the short of it, SR-IOR and DPDK enable better packet forwarding speeds on VNFs.
So a problem had arisen, carriers wanted to change certain carrier related settings on devices (Specifically the Carrier Config Manager) in the Android ecosystem. The Android maintainers didn’t want to open the permissions to change these settings to everyone, only the carrier providing service to that device.
And if you purchased a phone from Carrier A, and moved to Carrier B, how do you manage the permissions for Carrier B’s app and then restrict Carrier A’s app?
The carrier loads a certificate onto the SIM Cards, and signing Android Apps with this certificate, allowing the Android OS to verify the certificate on the card and the App are known to each other, and thus the carrier issuing the SIM card also issued the app, and presto, the permissions are granted to the app.
Carriers have full control of the UICC, so this mechanism provides a secure and flexible way to manage apps from the mobile network operator (MNO) hosted on generic app distribution channels (such as Google Play) while retaining special privileges on devices and without the need to sign apps with the per-device platform certificate or preinstall as a system app.
I’ve written a playbook that provisions some server infrastructure, however one of the steps is to change the hostname.
A common headache when changing the hostname on a Linux machine is that if the hostname you set for the machine, isn’t in the machine’s /etc/hosts file, then when you run sudo su or su, it takes a really long time before it shows you the prompt as the machine struggles to do a DNS lookup for it’s own hostname and fails,
This becomes an even bigger problem when you’re using Ansible to setup these machines, Ansible times out when changing the hostname;
Simple fix, edit the /etc/ansible/ansible.cfg file and include
The Origin-State-Id AVP solves a kind of tricky problem – how do you know if a Diameter peer has restarted?
It seems like a simple problem until you think about it. One possible solution would be to add an AVP for “Recently Rebooted”, to be added on the first command queried of it from an endpoint, but what if there are multiple devices connecting to a Diameter endpoint?
The Origin-State AVP is a strikingly simple way to solve this problem. It’s a constantly incrementing counter that resets if the Diameter peer restarts.
If a client receives a Answer/Response where the Origin-State AVP is set to 10, and then the next request it’s set to 11, then the one after that is set to 12, 13, 14, etc, and then a request has the Origin-State AVP set to 5, the client can tell when it’s restarted by the fact 5 is lower than 14, the one before it.
It’s a constantly incrementing counter, that allows Diameter peers to detect if the endpoint has restarted.
Simple but effective.
You can find more about this in RFC3588 – the Diameter Base Protocol.
I did a post yesterday on setting up YateBTS, I thought I’d cover the basic setup I had to do to get everything humming;
Subscribers
In order to actually accept subscribers on the network you’ll need to set a Regex pattern to match the prefix of the IMSI of the subscribers you want to connect to the network,
In my case I’m using programmable SIMs with MCC / MNC 00101 so I’ve put the regex pattern matching starting with 00101.
BTS Configuration
Next up you need to set the operating frequency (radio band), MNC and MCC of the network. I’m using GSM850,
Next up we’ll need to set the device we’re going to use for the TX/RX, I’m using a BladeRF Software Defined Radio, so I’ve selected that from the path.
Optional Steps
I’ve connected Yate to a SIP trunk so I can make and receive calls,
I’ve also put a tap on the GSM signaling, so I can see what’s going on, to access it just spin up Wireshark and filter for GSMMAP
A lot of the Yate tutorials are a few years old, so I thought I’d put together the steps I used on Ubuntu 18.04:
Installing Yate
apt-get install subversion autoconf build-essential
cd /usr/src svn checkout http://voip.null.ro/svn/yate/trunk yate
cd yate
./autogen.sh
./configure
make
make install-noapi