The Data Coding Scheme (DCS or TP-DCS) header in an SMS body indicates what encoding is used in that message.
It means if we’re using UCS-2 (UTF16) special characters like Emojis etc, in in our message, the phone knows to decode the data in the message body using UTF, because the Data Coding Scheme (DCS) header indicates the contents are encoded in UTF.
Likewise, if we’re not using any fancy characters in our message and the message is encoded as plain old GSM7, we set set the DCS to 0 to indicate this is using GSM7.
From my experience, I’d always assumed that DCS0 (Default) == GSM7, but today I learned, that’s not always the case. Some SMSc entities treat DCS0 as Latin.
Let me explain why this is stupid and why I wasted a lot of time on this.
We can indicate that a message is encoded as Latin by setting the DCS to 0x03:
We cannot indicate that the message is encoded as GSM7 through anything other than the default alphabet (DCS 0).
Latin has it’s own encoding flag, if I wanted the message treated as Latin, I’d indicate the message encoding is Latin in the DCS bit!
I spent a bunch of time trying to work out why a customer was having issues getting messages to subscribers on another operator, and it turned out the other operator treats messages we send to them on SMPP with DCS0 as Latin encoding, and then cracks the sads when trying to deliver it.
The above diff shows the message we send (Right), and the message they dry to deliver (left).
Well, lesson learned…