There’s a lot to like about SCTP. No head of line blocking, MTU issues, sequenced, acknowledged delivery of messages, not to mention Multi-Homing and message bundling.
But if you really want to get the most bang for your buck, you’ll need to tune your SCTP parameters to match the network conditions.
While tuning the parameters per-association would be time consuming, most SCTP stacks allow you to set templates for SCTP parameters, for example you would have a different set of parameters for the SCTP stacks inside your network, compared to SCTP stacks for say a roaming scenario or across microwave links.
IETF kindly provides a table with their recommended starting values for SCTP parameter tuning:
RTO.Initial | 3 seconds |
RTO.Min | 1 second |
RTO.Max | 60 seconds |
Max.Burst | 4 |
RTO.Alpha | 1/8 |
RTO.Beta | 1/4 |
Valid.Cookie.Life | 60 seconds |
Association.Max.Retrans | 10 attempts |
Path.Max.Retrans | 5 attempts (per destination address) |
Max.Init.Retransmits | 8 attempts |
HB.interval | 30 seconds |
HB.Max.Burst | 1 |
But by adjusting the Max Retrans and Retransmission Timeout (RTO) values, we can detect failures on the network more quickly, and reduce the number of packets we’ll loose should we have a failure.
We begin with the engineered round-trip time (RTT) – that is made up of the time it takes to traverse the link, processing time for the remote SCTP stack and time for the response to traverse the link again. For the examples below we’ll take an imaginary engineered RTT of 200ms.
RTO.min is the minimum retransmission timeout.
If this value is set too low then before the other side has had time to receive the request, process it and send a response, we’ve already retransmitted it.This should be set to the round trip delay plus processing needed to send and acknowledge a packet plus some allowance for variability due to jitter; a value of 1.15 times the Engineered RTT is often chosen
So for us, 200 * 1.15 = 230ms RTO.min value.
RTO.max is the maximum amount of time we should wait before transmitting a request.
Typically three times the Engineered RTT.
So for us, 200 * 3 = 600ms RTO.min value.
Path.Max.Retransmissions is the maximum number of retransmissions to be sent down a path before the path is considered to be failed.
For example if we loose a transmission path on a multi-homed server, how many retransmissions along that path should we send until we consider it to be down?
Values set are dependant on if you’re multi-homing or not (you can be more picky if you are) and the level of acceptable packet loss in your transmission link.
Typical values are 4 Retransmissions (per destination address) for a Single-Homed association, and 2 Retransmissions (per destination address) for a Multi-Homed association.
Association.Max.Retransmissions is the maximum number of retransmissions for an association. If a transmission link in a multi-homed SCTP scenario were to go down, we would pass the Path.Max.Retransmissions value and the SCTP stack would stop sending traffic out that path, and try another, but what if the remote side is down? In that scenario all our paths would fail, so we need another counter – Path.Max.Retransmissions to count the total number of retransmissions to an association / destination. When the Association.Max.Retransmissions is reached the association is considered down.
In practice this value would be the number of paths, multiplied by the Path.Max.Retransmissions.