Wednesday, June 4, 2008

CDMA SMS User Data

In this post I will describe the decoding of the user data portion of the bearer data field of a CDMA SMS message. See my first post for the structure of the bearer data field.

The User Data subparameter is the portion of the Bearer Data field of a CDMA SMS message that contains the actual message or payload. The user data is made up of an integral number of octets and is 0 padded as needed. The user data subparameter is documented in section 4.5.2 of the IS-637 spec. A PDF of this spec is available on the 3GPP2's web site. A PDF of the spec is also available on the TIA's website. The structure of the user data subparameter is:
  • The subparameter ID of 8 bits which is the constant that identifies the start of this subparameter in the bearer data.
  • The subparamter Len of 8 bits which is the number of octets that make up the value portion of this subparameter.
These two portions of the user data are processed as part of the bearer data. The rest of the user data subparameter is processed separately. The remaining fields of the user data are:
  • The message encoding is a 5 bit value that indicates which encoding scheme was used for the message.
  • The message type is an optional 8 bit value that is used only if the encoding is an IS-91 extended protocol message. See the specification document for the details.
  • The num fields is the number of data elements, of the size specified by the encoding and message type, that the message contains.
  • The chari portion contains the actual text or payload of the message.
  • The final portion the message is 0-7 bits of 0 padding as needed to fill the last octet.
The first step I took in decoding the user data was to write a function that determines the message encoding, the size of the data elements, and the starting byte of the message. To this I first start by defining some constants.
// masks and values for processing the user data fields
//
#define ENCODING_MASK 0xF8
#define ENCODE_OCTET 0X00
#define ENCODE_IS41 0X08
#define ENCODE_7BIT 0X10
#define ENCODE_IA5 0X18

#define MST_BYTE_1_MASK 0xE0
#define MST_BYTE_2_MASK 0x1F
#define NF_BYTE_1_MASK 0xE0
#define NF_BYTE_2_MASK 0x1F

// standard sized type definitions
//
typedef char sint_8;
typedef short sint_16;
typedef int sint_32;
typedef long long sint_64;

typedef unsigned char uint_8;
typedef unsigned short uint_16;
typedef unsigned int uint_32;
typedef unsigned long long uint_64;


Then I write the routine to do the initial processing. This function is writtento call another function that is will handle the actual decoding of the message.
void
decode_user_data( uint_8 *userData, size_t sz )
{
uint_8 *ud = userData; // current element
uint_8 *lud = userData + sz - 1; // last element

uint_8 encoding;
uint_8 mst;
uint_8 numFields;
uint_8 *nextByte = ud + 1;

int i;
for( i = 0; i < sz; i++ ) {
printf("%X\n",userData[i]);
}

mst = 7;
encoding = *ud & ENCODING_MASK;
switch( encoding ) {
case ENCODE_OCTET:
mst = 8;
break;

case ENCODE_IS41:
mst = (( *ud << 5 ) & MST_BYTE_1_MASK ) +
(( *nextByte >> 3 ) & MST_BYTE_2_MASK );
ud++;
nextByte++;
break;

case ENCODE_7BIT:
case ENCODE_IA5:
break;

default:
perror( "unknown paramters\n");
exit(0);
}

numFields = (( *ud << 5 ) & NF_BYTE_1_MASK ) +
(( *nextByte >> 3 ) & NF_BYTE_2_MASK );

printf("numFields: %d\n", numFields );
printf("first byte: %X\n", *nextByte);

switch( encoding ) {
case ENCODE_7BIT: {
char *text = decode_7bit_ascii(
nextByte, numFields, 3 );
printf("The text message is: '%s'\n", text );
free( text );
break;
}

case ENCODE_OCTET:
case ENCODE_IS41:
case ENCODE_IA5:
perror( "requested encoding is not implmented\n");
return;
break;
}
}
The only message type that I am concerned about with this is the 7bit packed ASCII. I will show how to unpack this into a NULL terminated string in another post.

Monday, June 2, 2008

Source Code Formatter

I have found blogger to be very frustrating to work with for posting source code. I quick search with Google and I found this source code formatter.

CDMA SMS Bearer Data

In December we tested the system I had been developing at Lucent's lab for the EARS project. This test was to prove the feasibility of sending broadcast SMS messages for emergency alerts. This testing was successful, for the most part. The one snag that was encountered was with the Bearer Data portion of the SMS message. The bearer data carriers the message that will be transmitted to the phone and I naively thought that this was just the text. But it turned out the bearer data is encoded according to the IS-637 specification. With a set of hex dumps from Lucent's internal testing tool I set out to figure out how to decode the bearer data so I could learn how to encoded it. Unfortunately, this line of work was stopped, just when I had almost gotten everything figured out. I couldn't let all of that work go to waste, so I finished up that task on my time and I present it to you now.

The structure of the SMS bearer data field in a CDMA system is defined in section 4.5 of the IS-637 spec. A PDF of this spec is available on the 3GPP2's web site. A PDF of the spec is also available on the TIA's website. In short the bearer data field is a series of fields where each field is an integral number of octets and the fields are 0 padded if necessary. The structure of the bearer data is in the form of parameter ID, parameter length, parameter value. Where parameter ID defines what data is being passed. The parameter length is the number of octets of the parameter value. The value of course is the data that we need to provide.

To decode the bearer data field I wrote a simple routine that loops through the data picking out all of the parameters. We were working with a minimal set of parameters of those available. The first step is to define a set of constants that will be used in the routine.
// bearer data subparameter identifiers
//
#define BD_MESSAGE_ID 0x00
#define BD_USER_DATA 0x01
#define BD_USER_RESP_CD 0x02
#define BD_TIMESTAMP 0x03
#define BD_VALIDITY_PER_ABS 0x04
#define BD_VALIDITY_PER_REL 0x05
#define BD_DEFERRED_DELIVERY_ABS 0x06
#define BD_DEFERRED_DELIVERY_REL 0x07
#define BD_PRIORITY_IND 0x08
#define BD_PRIVACY_IND 0x09
#define BD_REPLY_OPT 0x0A
#define BD_NUM_MSGS 0x0B
#define BD_ALERT_ON_DEL 0x0C
#define BD_LANG_IND 0x0D
#define BD_CALLBACK_NUM 0x0E

// standard sized type definitions
//
typedef char sint_8;
typedef short sint_16;
typedef int sint_32;
typedef long long sint_64;

typedef unsigned char uint_8;
typedef unsigned short uint_16;
typedef unsigned int uint_32;
typedef unsigned long long uint_64;

The routine to decode the bearer data just receives an array of octets (unsigned 8 bit integers) and the length of the array. It loops through the data and writes the received parameters to stdout.
void
decode_bearer_data( uint_8 *bearerData, size_t sz )
{
uint_8 *bd = bearerData; // current element
uint_8 *lbd = bearerData + sz - 1; // last element

uint_32 msgID = 0;
uint_8 userDataLen = 0;
uint_8 *userData = NULL;
uint_8 timestamp[6]; // YY MM DD hh mm ss
uint_8 msgDelivery = 1;

while( bd < lbd ){
switch( *bd ){
case BD_MESSAGE_ID:
if( *(++bd) != 3 ){
perror("message ID Len is not 3\n");
}
msgID = (*(++bd) << 16) + (*(++bd) << 8) +
*(++bd) ;
break;

case BD_USER_DATA:
userDataLen = *(++bd);
userData = bd + 1;
bd += userDataLen;
break;

case BD_TIMESTAMP:
if( *(++bd) != 6 ){
perror("timestamp len is not 6\n");
}
timestamp[0] = *(++bd);
timestamp[1] = *(++bd);
timestamp[2] = *(++bd);
timestamp[3] = *(++bd);
timestamp[4] = *(++bd);
timestamp[5] = *(++bd);
break;

case BD_ALERT_ON_DEL:
msgDelivery = *(++bd);
break;

case BD_USER_RESP_CD:
case BD_VALIDITY_PER_ABS:
case BD_VALIDITY_PER_REL:
case BD_DEFERRED_DELIVERY_ABS:
case BD_DEFERRED_DELIVERY_REL:
case BD_PRIORITY_IND:
case BD_PRIVACY_IND:
case BD_REPLY_OPT:
case BD_NUM_MSGS:
case BD_LANG_IND:
case BD_CALLBACK_NUM:
printf(
"sub parameter is not implemented: %X\n",
*bd);
exit(1);
break;

default:
printf("unknown sub parameter: %X\n", *bd);
exit(1);
}
bd++;
}

printf("BEARER DATA\n");
printf("msgID: %x\n", msgID);
printf("timesamp %x/%x/%x %x:%x:%x\n",
timestamp[0], timestamp[1], timestamp[2],
timestamp[3], timestamp[4], timestamp[5] );
printf("alert on delivery: %x\n", msgDelivery);

printf("\nPROCESSING USER DATA\n\n");
decode_user_data( userData, userDataLen );
}

The User Data field is the bearer data parameter that actually contains the message that we want to send. This field is additionally encoded and may contain data encoded in several different schemes. The text message data I was working with is encoded as packed 7 bit ASCII characters. I will cover this in another post.

Tuesday, April 22, 2008

Fortune Cookie Says

The best profit of future [sic] is the past.

Thursday, April 10, 2008

I'm Published!!!

Six years ago, during my short stint in graduate school, I wrote a paper, a comparative analysis of generic programming in Java and C++. My professor, and adviser, liked it and wanted to submit it for publication. After dealing with the second round of review comments I lost track of it. Then to my surprise I find it in a Google search. Yeah, OK, it was a vanity search, so sue me. The paper was published in volume 33, issue 2 of the journal Software Practice and Experience, in February 2003. Yep 5 years ago. So yesterday my friend LT gave me a really great present, she somehow got the actual issue of the journal. So I now have my very own dead tree version.

Wednesday, April 9, 2008

Parsing Socket Connections With Flex and Bison, part II

I left my first post on parsing socket connections with Flex and Bison with a note about a small memory leak. In this post I will show how to fix this leak.

I used Valgrind, which is a profiling and instrumenting application, to test for memory leaks. The command line I used to run the test server was:
valgrind --suppressions=./mysupps.supp --log-file-exactly=valgrind.log --leak-check=full --show-reachable=yes --leak-resolution=high --num-callers=40 -v ./stubserver

The process as describe in the first post, is that a statically allocated input buffer is populated every time the socket is read. Then yy_scan_string is called with the buffer so flex will start processing that buffer. The yy_scan_string function returns a YY_BUFFER_STATE handle each time it is called, see chapter 12 of the Flex Manual. Each YY_BUFFER_STATE handle consists of 3 allocations totaling 92 bytes of memory. Which can be seen in the Valgrind output file, which has been simplified for space.
searching for pointers to 3 not-freed blocks.
checked 59,892 bytes.

8 bytes in 1 blocks are still reachable in loss record 1 of 3
at 0x4022765: malloc
by 0x804D607: yyalloc
by 0x804D3D1: yy_scan_bytes
by 0x804D3B1: yy_scan_string
by 0x80491FE: yywrap
by 0x804C1ED: yylex
by 0x8049A9C: yyparse
by 0x8048EE2: main

36 bytes in 1 blocks are still reachable in loss record 2 of 3
at 0x4022862: realloc
by 0x804D621: yyrealloc
by 0x804D266: yyensure_buffer_stack
by 0x804CCD2: yy_switch_to_buffer
by 0x804D373: yy_scan_buffer
by 0x804D43C: yy_scan_bytes
by 0x804D3B1: yy_scan_string
by 0x80491FE: yywrap
by 0x804C1ED: yylex
by 0x8049A9C: yyparse
by 0x8048EE2: main

48 bytes in 1 blocks are still reachable in loss record 3 of 3
at 0x4022765: malloc
by 0x804D607: yyalloc
by 0x804D2E9: yy_scan_buffer
by 0x804D43C: yy_scan_bytes
by 0x804D3B1: yy_scan_string
by 0x80491FE: yywrap
by 0x804C1ED: yylex
by 0x8049A9C: yyparse
by 0x8048EE2: main

LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks.
possibly lost: 0 bytes in 0 blocks.
still reachable: 92 bytes in 3 blocks.
The leak is that unless we recover the memory we leak 92 bytes of memory every time yy_scan_string is called. To recover the memory allocated in the YY_BUFFER_STATE handle Flex provides the function yy_delete_buffer which is described in chapter 12 of the Flex Manual. To prevent the memory leak I created a buffer_state variable, as a void*, which is accessible to all of the parser source. This variable is initialized to NULL at application startup. Then the yy_wrap and parser_init functions call yy_delete_buffer on buffer_state if it is not NULL. And the yy_wrap function must set the buffer_state to NULL after it has been deleted to ensure that it is not deleted twice. Then when the functions set buffer_state to the return value of yy_scan_string. So the process including the memory management, with the changes highlighted, is:
  • The application calls parser_init( connfd, connfd, NULL, NULL )
    • parser_init sets the input and output file descriptors
    • parser_init sets the callbacks to the defaults
    • if buffer_state is not NULL call yy_delete_buffer
    • parser_init initializes the buffer and calls yy_scan_string
  • the application calls yyparse to start the parser
    • flex finds that the input buffer is empty and calls yywrap
      • if buffer_state is not NULL
        • call yy_delete_buffer
        • set buffer_state to NULL
      • yywrap calls the read callback.
        • the read callback reads from the socket and populates the buffer.
      • if data was read
        • yywrap calls yy_scan_string
        • yywrap returns 0
      • else return 1

Tuesday, April 1, 2008

Changes

So I've decided to get rid of my Mac and Linux boxes. I have 1 Macbook, 2 notebooks running Linux and an old PC running Linux as a file server that I am selling. I'm going to Microcenter over lunch today to buy a new laptop with Vista SP1 and the full Microsoft Office suite and a new PC with Windows Server 2003 to use as a file server. I'm also moving my blog over to Microsoft's Windows Live Blogger platform in the next couple of days.