Wednesday, February 10, 2010

Unpacking 7-bit ASCII

To (finally) complete my series of posts on processing CDMA SMS bearer data and user data fields. In my cases the actual text of the SMS message is packed 7-bit ASCII which corresponds to messages with an encoding flag of ENCODE_7BIT, or 0x10. A packed message then looks something like the following.

|aaaaaaab|bbbbbbcc|cccccddd|ddddeeee|eeefffff|ffgggggg|ghhhhhhh|


Which packs 8 characters (a - h) into 7 octets. To unpack the message we have to pick out each individual character and make it an 8-bit ASCII character.  It helps if we reorganize the data to look at each character individually as follows.

|aaaaaaax| |xxxxxxxx|
|xxxxxxxb| |bbbbbbxx|
|xxxxxxcc| |cccccxxx|
|xxxxxddd| |ddddxxxx|
|xxxxeeee| |eeexxxxx|
|xxxfffff| |ffxxxxxx|
|xxgggggg| |gxxxxxxx|
|xhhhhhhh| |xxxxxxxx|

From this we see that a 7-bit ASCII character can be packed into octets in one of 8 patterns. So to unpack the characters we need to know the current octet, the next octet, and the packing pattern. With that we can apply some bit shifting and bit masks to create an unpacked 8-bit ASCII character. 

I coded this so that the pattern associated to the character 'a' above is pattern 0. And the user data headers cause the SMS message to start at packing pattern 3, which matches the character 'd' above. The routine to decode the SMS message then just becomes a simple loop over the 7-bit ASCII characters while keeping track of the packing pattern.

/* copyright (c) 2010 Steve Hill - All rights reserved */
char*
decode_7bit_ascii( uint_8 *sms, uint_8 len, uint_8 startPat )
{
    uint_8 buffLen = len + 1;
    char *buff     = malloc( sizeof(char) * buffLen );

    memset( buff, 0, sizeof(char) * buffLen );
  
    char   *currChar = buff;        // current char in buff
    char   *lastChar = buff + len;  // last char in buff
    uint_8 *curr     = sms;         // current byte being converted
    uint_8 *next     = curr + 1;    // next byte
    uint_8  currPat  = startPat;    // conversion pattern

    while( currChar < lastChar )
    {
        switch( currPat )
        {
            case 0:     // aaaaaaax xxxxxxxx
                *currChar = ( *curr >> 1 ) & 0x7F;
                break;
            case 1:     // xxxxxxxa aaaaaaxx
                *currChar = (( *curr << 6 ) & 0x40 ) + 
                            (( *next >> 2 ) & 0x3F );
                break;
            case 2:     // xxxxxxaa aaaaaxxx
                *currChar = (( *curr << 5 ) & 0x60 ) + 
                            (( *next >> 3 ) & 0x1F );
                break;
            case 3:     // xxxxxaaa aaaaxxxx
                *currChar = (( *curr << 4 ) & 0x70 ) + 
                            (( *next >> 4 ) & 0x0F );
                break;
            case 4:     // xxxxaaaa aaaxxxxx
                *currChar = (( *curr << 3 ) & 0x78 ) + 
                            (( *next >> 5 ) & 0x07 );
                break;
            case 5:     // xxxaaaaa aaxxxxxx
                *currChar = (( *curr << 2 ) & 0x7C ) + 
                            (( *next >> 6 ) & 0x03 );
                break;
            case 6:     // xxaaaaaa axxxxxxx
                *currChar = (( *curr << 1 ) & 0x7E ) + 
                            (( *next >> 7 ) & 0x01 );
                break;
            case 7:     // xaaaaaaa xxxxxxxx
                *currChar = *curr & 0x7F;
                break;
        }

        currChar++;
        if( currPat ) // stay on current byte if pattern 0
        {
            curr++;
            next++;
        }
        currPat = ++currPat % 8;
    }

    return buff;
}

No comments: