How to count SMS segments?

464 views Asked by At

In the SMS specification, it can deliver messages of 160 characters at once.

That means, if I try to send over 160 (e.g. 161) it is split into two SMSes automatically, then delivered to receiver.

But, nowadays phones don't show 2 messages. It just shows up as 1 message.

It looks like there's some header to represent message identity and it's assembled automatically in the phone.

BTW, is there any way know about SMS header info and how many messages a really delivered/received?

My smartphone (nexus-5) doesn't show it.

2

There are 2 answers

0
pchero On

SMS is not just delivering 160 characters(1120 bits / (7 bits/character) = 160 characters) for every each time.

If it segmented, it can deliver only 153 characters. (http://spin.atomicobject.com/2011/04/20/concatenated-sms-messages-and-character-counts/)

Nc = Total number of characters in message
Nx = Characters from extended GSM table (|^{}[]~\ and euro)
L = Message length in 7-bit characters
M = Number of messages

L = Nc + Nx
L > 160:  M = L / 153 [rounded up]
L <= 160: M = 1

The division by 153 is because when sms's are divided into parts, each part gets a header of 48 bits.

0
Christian Davén On

These are the axioms we should start with:

  1. An sms message (called "user data") can contain 140 bytes exactly
  2. Messages are encoded with either 7-bit GSM 03.38 or UTF-16 (you can think of UTF-16 as UCS-2 extended with support for emojis)
  3. If the message contains any character that is not in GSM 03.38, UTF-16 will be used for the whole message
  4. When concatenating multiple smses, each sms gets a 6-byte user data header, so they can now only contain 134 bytes of message text

So, the code to calculate the number of sms segments or parts for a message could be something like this:

var messageByteSize = CalculateMessageByteSizeDependingOnEncoding(message);
if (messageByteSize > 140)
{
    // We round up, so that 1.1 parts become 2 parts
    return (int)Math.Ceiling(messageByteSize / 134.0);
}
else
{
    return 1;
}

To determine which encoding will be used, check if all the characters are present in the GSM 03.38 character set. If so, that encoding will be used for the whole message. Otherwise, UTF-16 will be used for the whole message.

GSM 03.38

Characters in GSM 03.38 are stored as septets and take 7 bits (remember, one byte is 8 bits).

Some characters, however, are escaped with an invisible character, and therefore take 2 septets: |^€{}[]~\.

To get the total number of septets for a message, you count the total number of characters (both escaped and non-escaped), and then add the number of escaped characters. This will count the escaped characters twice.

The number of bytes is then septets * 7.0 / 8.0, which gives that 160 septets is 140 bytes.

In 140 bytes, you can fit 160 unescaped characters (1 septet per character) or 80 escaped characters (2 septets per character).

UTF-16

Each UTF-16 character takes up either 2 or 4 bytes, depending on whether the character exists in the Basic Multilingual Plane (BMP) or the Supplementary Multilingual Plane (SMP; all the emojis are here).

You can calculate the total bytes for a UTF-16 string like this, using C#: Encoding.Unicode.GetByteCount(message).

For Javascript, it's as easy as this: message.length * 2, since String.length returns the number of UTF-16 "code units", and each such code unit takes 2 bytes.

In 140 bytes, you can fit 70 characters from the BMP (2 bytes per character) or 35 characters from the SMP (4 bytes per character).