Monday, December 22, 2008

Project Euler problems 6 - 10

I finally found some time to sit down and do problems 6 through 10 on Project Euler.

Problem 6: Find the difference between the sum of the squares of the first one hundred natural numbers and the square of the sum.
def euler6()
(1..100).inject(0){|val,k| val += k} ** 2 -
(1..100).inject(0){|val,k| val += k**2}
end

Problem 7: What is the 10001st prime number?
def euler7()
myprime = Prime.new
(1..10000).each{ myprime.next}
myprime.next
end

Problem 8: Find the greatest product of five consecutive digits in the 1000-digit number.

The 1000 digit number is a string in my code.
def euler8(bigNum)
len = bigNum.size
stopAt = len - 5
sliceStart = 0
res = 0
while sliceStart < stopAt do
val = 1
s = bigNum.slice(sliceStart, 5)
s.each_char{|x| val *= x.to_i}
res = val if res < val
sliceStart += 1
end
res
end

Problem 9: There exists exactly one Pythagorean triplet for which a + b + c = 1000. Find the product abc.
def euler9()
val = 0
(1..500).each{ |a|
(a..500).each{ |b|
c = (a**2 + b**2)**0.5
next unless c == c.to_i # continue unless c is an integer
val = a + b + c
return a * b * c if val == 1000
break if val > 1000
}
}

end

Project 10: Find the sum of all the primes below two million.

This was the only really challenging problem in this set. I tried to use Ruby's Prime class in a loop for this. When ran for 50,000 iterations it would return in about 14 seconds. For 2,000,000 I eventually killed the process after 15 minutes. The algorithm used in the Prime class seems to fall down when used to generate large lists of prime numbers.

I did some searching and found that most people were using some type of number sieve to generate the list of prime numbers, see Wikipedia here and here. I found and shamelessly stole a Ruby implementation of the Sieve of Eratosthenes from Paul Callagher. I slightly modified Paul's implementation to make it a standalone function instead of a class method.
# problem 10, find the sum of all primes below 2,000,000.
# prime series up to this limit, using Sieve of Eratosthenes method
# http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes
#
# from http://tardate.blogspot.com/2008/10/rolling-project-euler-on-ruby.html
def sieve_of_eratosthenes( maxVal )
limit = Math.sqrt(maxVal)
a = (2..maxVal).to_a
n = 2
while (n < limit) do
x = n*2
begin
a[x-2]=2
x+=n
end until (x > maxVal )
begin
n+=1
end until ( a[n-2] != 2 )
end
a.uniq!
end

def euler10()
sieve_of_eratosthenes( 2_000_000 ).inject(0){|val, x| val += x }
end

This code returns in about 4 seconds.

Thursday, November 27, 2008

Fortune Cookie Says

Good things are being said about you.

Wednesday, November 26, 2008

Project Euler

I recently found out about Project Euler which has a collection of math/programming problems to solve. I have started working through the problems using Ruby. I have decided on using Ruby in my current project, which I will do some posts on later, but I don't have any real experience using it, so this will give me something to work with while learning the language.

The first problem: Sum all integers less than 1,000 that are multiples of 3 or 5. This is my third version and I have it down to a single line.
(1..999).inject(0){|result, n|
(0==n%3 or 0==n%5)? result + n : result}


The second problem: Sum all even numbers in the Fibonacci sequence that do not exceed 4,000,000. My first solution uses a recursive algorithm.
def euler2_rec(n1=1, n2=1)
return 0 if n1 > 4000000
fib = n1 + n2
sum = euler2_rec(n2, fib)
sum += fib if 0 == fib % 2
return sum
end

Since this is tail recursive and any tail recursive algorithm can be converted to a loop. I decided to create a loop version.
def euler2_loop
n1 = 1
n2 = 1
sum = 0
begin
fib = n1 + n2
sum += fib if 0 == fib % 2
n1 = n2
n2 = fib
end while fib < 4000000
return sum
end

The third problem: Find the largest prime factor of 600,851,475,143. For this I used Ruby's Prime class.
def euler3()
num = 600851475143
primes = Prime.new
factor = 0
while num > 1 do
factor = primes.next
num /= factor if 0 == num % factor
end
return factor
end

I had originally used num / 2 as my halting condition. But reading the comments after solving the problem I saw that others divided out the factors as they found them. This allows the loop to exit after the largest factor is found, and is rather obvious once you see it.

The fourth problem
: Find the largest palindromic number that is made from the product of two 3-digit numbers.
def euler4()
result = 0
100.upto(999) { |i|
i.upto(999) { |j|
val = i * j
result = val if result < val and
val.to_s == val.to_s.reverse
}}
return result
end

The fifth problem: Find the smallest number that is evenly divisible by all numbers between 1 and 20.

The easy and obvious approach for this problem is to take each number and test it to see if it evenly divisible by the numbers 2 to 20, all integers being evenly divisible by 1. One can think of some optimization that can be applied to this approach. For example a number must end in a 0 to be evenly divisible by 10. Also any number that is evenly divisible by 20 is also divisible by 10, 5, 4, and 2. Also one can consider that all, non prime, numbers can be factored into prime numbers. So the problem can be reduced to working with a set of prime numbers.

However, I wasn't sure of the correct approach, and being too lazy to get off of the couch to crack open a dusty old math book, I opted for brute force and ignorance and wrote program to iteratively check all numbers. While that was chewing up my CPU in the background I searched for a better solution and found this blog post on the problem. The solution proposed in that post is to find the accumulated least common multiple (LCM) for the numbers 2 to 20. A bit more searching turned up this page which had sample code, in Ruby, which would calculate the LCM and greatest common factor (GCF). So a little bit of time later, and with my original code still running, I had this solution.
require "gcf_lcm.rb"
def euler5()
lcm = calc_lcm(2,3, calc_gcf(2,3) )
4.upto(20) { |n|
lcm = calc_lcm(lcm, n, calc_gcf(lcm, n) ) }
return lcm
end

This returns almost instantly, and with the correct answer. When looking through the comments on this problem I found a posting with a Ruby solution. It turns out there are library routines in Ruby for LCM and GCF and the solution can be turned into a one liner.
(2..20).inject(1){|res,n| res.lcm n}

Thursday, October 16, 2008

Fortune Cookie Says

Tomorrow your friend or partner will tell you some exciting news.

Saturday, September 20, 2008

Fortune Cookie Says

You will soon be crossing the great waters.

Project Backup Script

I am using trac and subversion on my current project. I wrote the following script to do a full backup of trac and subversion and burn the results to a CD.
#!/bin/bash

## Backup subversion and trac and burn the backups to a CD.
##

## applications
##
TRACADMIN=/usr/bin/trac-admin
SVNADMIN=/usr/bin/svnadmin
MKISOFS=/usr/bin/mkisofs
CDRECORD=/usr/bin/cdrecord
DATE=/bin/date

## directory and file paths
##
REPO=/opt/repo/tde
TRACPROJ=/opt/trac
BACKUPDIR=/home/steve/backup
TMPDIR=$$
CDDEVICE=/dev/cdrom
SVNDUMPFILE=$BACKUPDIR/$TMPDIR/temp/svn_dump.txt
DATEFILE=$BACKUPDIR/$TMPDIR/temp/backupdate.txt
LOGFILE=$BACKUPDIR/$TMPDIR/backup.log

if [[ $EUID -ne 0 ]]; then
echo "This script must be run as root"
exit 1
fi

## check for the directory structure and attempt to create if needed
##
if [[ ! -e $BACKUPDIR ]]; then
if ! mkdir $BACKUPDIR; then
exit 1;
fi
fi

if [[ -e $BACKUPDIR/$TMPDIR ]]; then
exit 1; # temp dir already exists
else
if ! mkdir $BACKUPDIR/$TMPDIR; then
exit 1
fi
fi

if ! mkdir $BACKUPDIR/$TMPDIR/iso; then
echo "Failed to create $BACKUPDIR$TMPDIR/iso" >> $LOGFILE;
exit 1;
fi

if ! mkdir $BACKUPDIR/$TMPDIR/temp; then
echo "Failed to create $BACKUPDIR/$TMPDIR/temp" >> $LOGFILE;
exit 1;
fi

## dump subversionto SVNDUMPFILE
##
echo "Dumping subversion" >>$LOGFILE
echo "" >>$LOGFILE
if ! $SVNADMIN dump $REPO > $SVNDUMPFILE 2>>$LOGFILE; then
exit 1;
fi

## hot backup trac
##
echo "" >>$LOGFILE
echo "Performing hotcopy of trac" >> $LOGFILE
echo "" >>$LOGFILE
if ! $TRACADMIN $TRACPROJ hotcopy $BACKUPDIR/$TMPDIR/temp/trac \
1>>$LOGFILE 2>>$LOGFILE; then
exit 1;
fi

## create a file with the date time the backup was done
##
echo `$DATE` > $DATEFILE

## create iso file
## mkisofs does not seem to return a status value so we dont know if it
## actually succeeded.
##
echo "" >>$LOGFILE
echo "Making ISO file" >> $LOGFILE
echo "" >>$LOGFILE
$MKISOFS -fRrlJ -o $BACKUPDIR/$TMPDIR/iso/backup.iso \
$BACKUPDIR/$TMPDIR/temp 1>>$LOGFILE 2>>$LOGFILE

## burn the iso file to a cd
## cdrecord does not seem to return a status value so we dont know if it
## actually succeeded.
##
echo "" >>$LOGFILE
echo "Burning the CD" >> $LOGFILE
echo "" >>$LOGFILE
$CDRECORD -eject speed=24 dev=$CDDEVICE $BACKUPDIR/$TMPDIR/iso/backup.iso \
1>>$LOGFILE 2>>$LOGFILE
#if $CDRECORD speed=24 dev=$CDDEVICE $BACKUPDIR/$TMPDIR/iso/backup.iso \
# 1>>$LOGFILE 2>>$LOGFILE; then
# exit 1;
#fi

## clean up if everything was OK
## since we don't know if mkisofs or cdrecord succeeded do not
## automatically delete the director.
##
#rm -rf $BACKUPDIR/$TMPDIR

The only thing in this script to make note of is the line TMPDIR=$$. The TMPDIR variable is will be the name of the temporary directory to backup to and this is set to the PID of the process which is returned by $$.

To automate the backup I run the script through cron. To setup the cron entry I ran the following commands as root.
[root@myhost]# export EDITOR=vi
[root@myhost]# crontab -e
This will bring up a vi editor with root's crontab file. I then added the following line and saved the file.
59 23 * * 5 /home/steve/bin/sp_backup.sh

The script will now be run every Friday at 11:59 PM. The only problems that I encountered with this script is that neither mkisofs or cdrecord seem to return the UNIX standard value of 0 for success and non-zero for failure. I was going to have the script delete all of the directories and files that were created. But because I can't determine if the script was successful I decided to leave the files in place and to manually ensure that the backup was successful.

Wednesday, August 27, 2008

Fortune Cookie Says

Maybe you can live on the moon in next century.

Processing Apache Access Logs

I need to keep track of who is logging in and using our project's issue tracking and wiki sites. I wanted a simple list of who has logged in on what day. I came up with the following command line to give me what I wanted.
cat access_log | awk '{ if ( $3 !~/-/ ) \
{print $3" " substr($4,2, index($4,":")-2 )} }' \
| sort | uniq
This gives me a list of users and login dates grouped by user as follows.
bambam 25/Aug/2008
barney 25/Aug/2008
barney 26/Aug/2008
fred 25/Aug/2008
fred 26/Aug/2008
fred 27/Aug/2008

Sunday, June 22, 2008

Fortune Cookie Says

Endurance and persistence will be rewarded.

Wednesday, June 4, 2008

CDMA SMS User Data

In this post I will describe the decoding of the user data portion of the bearer data field of a CDMA SMS message. See my first post for the structure of the bearer data field.

The User Data subparameter is the portion of the Bearer Data field of a CDMA SMS message that contains the actual message or payload. The user data is made up of an integral number of octets and is 0 padded as needed. The user data subparameter is documented in section 4.5.2 of the IS-637 spec. A PDF of this spec is available on the 3GPP2's web site. A PDF of the spec is also available on the TIA's website. The structure of the user data subparameter is:
  • The subparameter ID of 8 bits which is the constant that identifies the start of this subparameter in the bearer data.
  • The subparamter Len of 8 bits which is the number of octets that make up the value portion of this subparameter.
These two portions of the user data are processed as part of the bearer data. The rest of the user data subparameter is processed separately. The remaining fields of the user data are:
  • The message encoding is a 5 bit value that indicates which encoding scheme was used for the message.
  • The message type is an optional 8 bit value that is used only if the encoding is an IS-91 extended protocol message. See the specification document for the details.
  • The num fields is the number of data elements, of the size specified by the encoding and message type, that the message contains.
  • The chari portion contains the actual text or payload of the message.
  • The final portion the message is 0-7 bits of 0 padding as needed to fill the last octet.
The first step I took in decoding the user data was to write a function that determines the message encoding, the size of the data elements, and the starting byte of the message. To this I first start by defining some constants.
// masks and values for processing the user data fields
//
#define ENCODING_MASK 0xF8
#define ENCODE_OCTET 0X00
#define ENCODE_IS41 0X08
#define ENCODE_7BIT 0X10
#define ENCODE_IA5 0X18

#define MST_BYTE_1_MASK 0xE0
#define MST_BYTE_2_MASK 0x1F
#define NF_BYTE_1_MASK 0xE0
#define NF_BYTE_2_MASK 0x1F

// standard sized type definitions
//
typedef char sint_8;
typedef short sint_16;
typedef int sint_32;
typedef long long sint_64;

typedef unsigned char uint_8;
typedef unsigned short uint_16;
typedef unsigned int uint_32;
typedef unsigned long long uint_64;


Then I write the routine to do the initial processing. This function is writtento call another function that is will handle the actual decoding of the message.
void
decode_user_data( uint_8 *userData, size_t sz )
{
uint_8 *ud = userData; // current element
uint_8 *lud = userData + sz - 1; // last element

uint_8 encoding;
uint_8 mst;
uint_8 numFields;
uint_8 *nextByte = ud + 1;

int i;
for( i = 0; i < sz; i++ ) {
printf("%X\n",userData[i]);
}

mst = 7;
encoding = *ud & ENCODING_MASK;
switch( encoding ) {
case ENCODE_OCTET:
mst = 8;
break;

case ENCODE_IS41:
mst = (( *ud << 5 ) & MST_BYTE_1_MASK ) +
(( *nextByte >> 3 ) & MST_BYTE_2_MASK );
ud++;
nextByte++;
break;

case ENCODE_7BIT:
case ENCODE_IA5:
break;

default:
perror( "unknown paramters\n");
exit(0);
}

numFields = (( *ud << 5 ) & NF_BYTE_1_MASK ) +
(( *nextByte >> 3 ) & NF_BYTE_2_MASK );

printf("numFields: %d\n", numFields );
printf("first byte: %X\n", *nextByte);

switch( encoding ) {
case ENCODE_7BIT: {
char *text = decode_7bit_ascii(
nextByte, numFields, 3 );
printf("The text message is: '%s'\n", text );
free( text );
break;
}

case ENCODE_OCTET:
case ENCODE_IS41:
case ENCODE_IA5:
perror( "requested encoding is not implmented\n");
return;
break;
}
}
The only message type that I am concerned about with this is the 7bit packed ASCII. I will show how to unpack this into a NULL terminated string in another post.

Monday, June 2, 2008

Source Code Formatter

I have found blogger to be very frustrating to work with for posting source code. I quick search with Google and I found this source code formatter.

CDMA SMS Bearer Data

In December we tested the system I had been developing at Lucent's lab for the EARS project. This test was to prove the feasibility of sending broadcast SMS messages for emergency alerts. This testing was successful, for the most part. The one snag that was encountered was with the Bearer Data portion of the SMS message. The bearer data carriers the message that will be transmitted to the phone and I naively thought that this was just the text. But it turned out the bearer data is encoded according to the IS-637 specification. With a set of hex dumps from Lucent's internal testing tool I set out to figure out how to decode the bearer data so I could learn how to encoded it. Unfortunately, this line of work was stopped, just when I had almost gotten everything figured out. I couldn't let all of that work go to waste, so I finished up that task on my time and I present it to you now.

The structure of the SMS bearer data field in a CDMA system is defined in section 4.5 of the IS-637 spec. A PDF of this spec is available on the 3GPP2's web site. A PDF of the spec is also available on the TIA's website. In short the bearer data field is a series of fields where each field is an integral number of octets and the fields are 0 padded if necessary. The structure of the bearer data is in the form of parameter ID, parameter length, parameter value. Where parameter ID defines what data is being passed. The parameter length is the number of octets of the parameter value. The value of course is the data that we need to provide.

To decode the bearer data field I wrote a simple routine that loops through the data picking out all of the parameters. We were working with a minimal set of parameters of those available. The first step is to define a set of constants that will be used in the routine.
// bearer data subparameter identifiers
//
#define BD_MESSAGE_ID 0x00
#define BD_USER_DATA 0x01
#define BD_USER_RESP_CD 0x02
#define BD_TIMESTAMP 0x03
#define BD_VALIDITY_PER_ABS 0x04
#define BD_VALIDITY_PER_REL 0x05
#define BD_DEFERRED_DELIVERY_ABS 0x06
#define BD_DEFERRED_DELIVERY_REL 0x07
#define BD_PRIORITY_IND 0x08
#define BD_PRIVACY_IND 0x09
#define BD_REPLY_OPT 0x0A
#define BD_NUM_MSGS 0x0B
#define BD_ALERT_ON_DEL 0x0C
#define BD_LANG_IND 0x0D
#define BD_CALLBACK_NUM 0x0E

// standard sized type definitions
//
typedef char sint_8;
typedef short sint_16;
typedef int sint_32;
typedef long long sint_64;

typedef unsigned char uint_8;
typedef unsigned short uint_16;
typedef unsigned int uint_32;
typedef unsigned long long uint_64;

The routine to decode the bearer data just receives an array of octets (unsigned 8 bit integers) and the length of the array. It loops through the data and writes the received parameters to stdout.
void
decode_bearer_data( uint_8 *bearerData, size_t sz )
{
uint_8 *bd = bearerData; // current element
uint_8 *lbd = bearerData + sz - 1; // last element

uint_32 msgID = 0;
uint_8 userDataLen = 0;
uint_8 *userData = NULL;
uint_8 timestamp[6]; // YY MM DD hh mm ss
uint_8 msgDelivery = 1;

while( bd < lbd ){
switch( *bd ){
case BD_MESSAGE_ID:
if( *(++bd) != 3 ){
perror("message ID Len is not 3\n");
}
msgID = (*(++bd) << 16) + (*(++bd) << 8) +
*(++bd) ;
break;

case BD_USER_DATA:
userDataLen = *(++bd);
userData = bd + 1;
bd += userDataLen;
break;

case BD_TIMESTAMP:
if( *(++bd) != 6 ){
perror("timestamp len is not 6\n");
}
timestamp[0] = *(++bd);
timestamp[1] = *(++bd);
timestamp[2] = *(++bd);
timestamp[3] = *(++bd);
timestamp[4] = *(++bd);
timestamp[5] = *(++bd);
break;

case BD_ALERT_ON_DEL:
msgDelivery = *(++bd);
break;

case BD_USER_RESP_CD:
case BD_VALIDITY_PER_ABS:
case BD_VALIDITY_PER_REL:
case BD_DEFERRED_DELIVERY_ABS:
case BD_DEFERRED_DELIVERY_REL:
case BD_PRIORITY_IND:
case BD_PRIVACY_IND:
case BD_REPLY_OPT:
case BD_NUM_MSGS:
case BD_LANG_IND:
case BD_CALLBACK_NUM:
printf(
"sub parameter is not implemented: %X\n",
*bd);
exit(1);
break;

default:
printf("unknown sub parameter: %X\n", *bd);
exit(1);
}
bd++;
}

printf("BEARER DATA\n");
printf("msgID: %x\n", msgID);
printf("timesamp %x/%x/%x %x:%x:%x\n",
timestamp[0], timestamp[1], timestamp[2],
timestamp[3], timestamp[4], timestamp[5] );
printf("alert on delivery: %x\n", msgDelivery);

printf("\nPROCESSING USER DATA\n\n");
decode_user_data( userData, userDataLen );
}

The User Data field is the bearer data parameter that actually contains the message that we want to send. This field is additionally encoded and may contain data encoded in several different schemes. The text message data I was working with is encoded as packed 7 bit ASCII characters. I will cover this in another post.

Tuesday, April 22, 2008

Fortune Cookie Says

The best profit of future [sic] is the past.

Thursday, April 10, 2008

I'm Published!!!

Six years ago, during my short stint in graduate school, I wrote a paper, a comparative analysis of generic programming in Java and C++. My professor, and adviser, liked it and wanted to submit it for publication. After dealing with the second round of review comments I lost track of it. Then to my surprise I find it in a Google search. Yeah, OK, it was a vanity search, so sue me. The paper was published in volume 33, issue 2 of the journal Software Practice and Experience, in February 2003. Yep 5 years ago. So yesterday my friend LT gave me a really great present, she somehow got the actual issue of the journal. So I now have my very own dead tree version.

Wednesday, April 9, 2008

Parsing Socket Connections With Flex and Bison, part II

I left my first post on parsing socket connections with Flex and Bison with a note about a small memory leak. In this post I will show how to fix this leak.

I used Valgrind, which is a profiling and instrumenting application, to test for memory leaks. The command line I used to run the test server was:
valgrind --suppressions=./mysupps.supp --log-file-exactly=valgrind.log --leak-check=full --show-reachable=yes --leak-resolution=high --num-callers=40 -v ./stubserver

The process as describe in the first post, is that a statically allocated input buffer is populated every time the socket is read. Then yy_scan_string is called with the buffer so flex will start processing that buffer. The yy_scan_string function returns a YY_BUFFER_STATE handle each time it is called, see chapter 12 of the Flex Manual. Each YY_BUFFER_STATE handle consists of 3 allocations totaling 92 bytes of memory. Which can be seen in the Valgrind output file, which has been simplified for space.
searching for pointers to 3 not-freed blocks.
checked 59,892 bytes.

8 bytes in 1 blocks are still reachable in loss record 1 of 3
at 0x4022765: malloc
by 0x804D607: yyalloc
by 0x804D3D1: yy_scan_bytes
by 0x804D3B1: yy_scan_string
by 0x80491FE: yywrap
by 0x804C1ED: yylex
by 0x8049A9C: yyparse
by 0x8048EE2: main

36 bytes in 1 blocks are still reachable in loss record 2 of 3
at 0x4022862: realloc
by 0x804D621: yyrealloc
by 0x804D266: yyensure_buffer_stack
by 0x804CCD2: yy_switch_to_buffer
by 0x804D373: yy_scan_buffer
by 0x804D43C: yy_scan_bytes
by 0x804D3B1: yy_scan_string
by 0x80491FE: yywrap
by 0x804C1ED: yylex
by 0x8049A9C: yyparse
by 0x8048EE2: main

48 bytes in 1 blocks are still reachable in loss record 3 of 3
at 0x4022765: malloc
by 0x804D607: yyalloc
by 0x804D2E9: yy_scan_buffer
by 0x804D43C: yy_scan_bytes
by 0x804D3B1: yy_scan_string
by 0x80491FE: yywrap
by 0x804C1ED: yylex
by 0x8049A9C: yyparse
by 0x8048EE2: main

LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks.
possibly lost: 0 bytes in 0 blocks.
still reachable: 92 bytes in 3 blocks.
The leak is that unless we recover the memory we leak 92 bytes of memory every time yy_scan_string is called. To recover the memory allocated in the YY_BUFFER_STATE handle Flex provides the function yy_delete_buffer which is described in chapter 12 of the Flex Manual. To prevent the memory leak I created a buffer_state variable, as a void*, which is accessible to all of the parser source. This variable is initialized to NULL at application startup. Then the yy_wrap and parser_init functions call yy_delete_buffer on buffer_state if it is not NULL. And the yy_wrap function must set the buffer_state to NULL after it has been deleted to ensure that it is not deleted twice. Then when the functions set buffer_state to the return value of yy_scan_string. So the process including the memory management, with the changes highlighted, is:
  • The application calls parser_init( connfd, connfd, NULL, NULL )
    • parser_init sets the input and output file descriptors
    • parser_init sets the callbacks to the defaults
    • if buffer_state is not NULL call yy_delete_buffer
    • parser_init initializes the buffer and calls yy_scan_string
  • the application calls yyparse to start the parser
    • flex finds that the input buffer is empty and calls yywrap
      • if buffer_state is not NULL
        • call yy_delete_buffer
        • set buffer_state to NULL
      • yywrap calls the read callback.
        • the read callback reads from the socket and populates the buffer.
      • if data was read
        • yywrap calls yy_scan_string
        • yywrap returns 0
      • else return 1

Tuesday, April 1, 2008

Changes

So I've decided to get rid of my Mac and Linux boxes. I have 1 Macbook, 2 notebooks running Linux and an old PC running Linux as a file server that I am selling. I'm going to Microcenter over lunch today to buy a new laptop with Vista SP1 and the full Microsoft Office suite and a new PC with Windows Server 2003 to use as a file server. I'm also moving my blog over to Microsoft's Windows Live Blogger platform in the next couple of days.

Saturday, March 22, 2008

Fortune Cookie Says

Remember to share good fortune as well as bad with your friends.

Monday, March 17, 2008

Fortune Cookie Says

Discontent is the first step in the progress of a man or a nation.

Friday, March 14, 2008

Binary-Coded Decimal

Introduction

For the EARS project I needed to interface to some custom built transmitter hardware and to telecoms equipment. Both of these systems wanted to receive numeric data in BCD, or Binary-Coded Decimal, format. And if your computer engineering knowledge is as old, rusty, and forgotten as mine, a lot of review is in order.


The Basics

A byte is typically 8 bits and an 8 bit byte is called an octet. A byte can be divided two 4 bit nibbles each of which contains a single hexadecimal digit. Where a hexadecimal digit is a numeric value in base 16 and represents a decimal value of 0 to 15 or 0x0 to 0xf in hex notation, and a byte is represented as two hex digits. A BCD value then is a binary representation of a decimal number. So a BCD representation of the number 129 may be 0x129, which consists of three hex digits one for each decimal digit. The straight binary representation takes only a single byte and would be the hex value 0x81 or the binary value of 10000001. A BCD value may be unpacked in which each decimal digit takes an entire byte, or packed in which each decimal digit takes one nibble. So the unpacked representation of 129 takes 3 bytes and is the value 0x010209. The packed BCD value will only take two bytes and is the value 0x0129.


Converting

The goal is to convert a string of numbers into its BCD representation. It is very easy to convert an ASCII numeric character to its hexadecimal equivalent. From the table of ASCII characters we see that the character '0' is the hex value 0x30 and '9' is the value 0x39. So we just need to subtract 0x30 from each numeric character to get its binary representation. This is of course easy if one uses the appropriate programming language, which in all cases where one needs portability, performance, and maximum bit twiddling functionality, is C.

A simple search with your favorite search engine will result in many pages of information and example code. As most of my work is required to fit in the mythical realm of 5 9s I require a bit more robustness than you will probably find in your search. So my implementation is as follows.


Prototypes

Since I only need to go from a string to BCD I wrote two functions, a2ubcd to convert to unpacked BCD, and a2pbcd to convert to packed BCD. The prototypes of the functions are:


int a2ubcd( const char *str, uint_8 *bcd, size_t sz );
int a2pbcd( const char *str, uint_8 *bcd, size_t sz );

The function comments are basically also identical so following is the comment for the a2ubcd function.

Convert a string to an unpacked BCD array.
Converts the numbers in the string to an unpacked BCD number.

The BCD number is represented as an array of bytes with each byte being one digit. The LSD of the number will be the last element of the array, and if the BCD array is larger than the number of digits the BCD number will be padded with leading 0s.

The string to convert must be a null terminated string with the first character being a number. The conversion will start at the last numeric character found. If the BCD array is not large enough to hold the number it is truncated and an error is returned.

@param str The string to convert to BCD.
@param bcd The array to hold the BCD number.
@param sz The length of bcd.

@returns RET_OK on success.
RET_INVALID_PARAM if str or bcd is NULL, sz <= 0, or if the first character of str is not a numeric. RET_OUT_OF_RANGE if bcd is not large enough to hold the number.

The return values are obviously defined constants. The uint_8 type is an unsigned 8bit integer, which is most likely an unsigned short.


Implementation

As much as I would like to just paste my code in I do not have the rights to do so. Hopefully, the pseudo-coded description will suffice.


function a2ubcd
receives: char *str, uint_8 *bcd, size_t sz
returns: status code

begin
if str or bcd is NULL return invalid param
if sz <= 0 return invalid param
if the first character of str is not a number
return invalid param

set currChar to the first character of str
set nextChar to the second character of str

while nextChar is not NULL and nextChar is a number
set currChar to nextChar
set nextChar to the next character of str
end while

set currDigit to the last element of bcd
while currDigit is an element of bcd
if currChar is an element of str
set currDigit to currChar - 0x30
set currChar to the previous character of str
else
set currDigit to 0
end if
end while

if currChar is an element of str
return out of range
else
return OK
end

The packed BCD conversion function is mostly identical to the unpacked version. The differences being in the while loop that sets the bcd values. The packed version of this loop is.


set currDigit to the last element of bcd

while currDigit is an element of bcd
set highNibble to '0'
set lowNibble to '0'
if currChar is an element of str
set lowNibble to currChar
set currChar to the previous character of str
end if

if currChar is an element of str
set highNibble to currChar
set currChar to the previous character of str
end if

set currDigit to ((highNibble - 0x30) shift left 4) + lowNibble - 0x30
end while

Tuesday, March 11, 2008

A Day In The Life

Here is a peek at the glamorous world of software development, otherwise know as a picture of what my desk looked like this morning.



Most of this mess is specification documents for the GSM and CDMA cellular systems. And there is another 4 pile of documents to the right of this picture. The threat of paper cuts is a constant danger in the IT industry.

Sunday, March 2, 2008

Fortune Cookie Says

You do not have to know where you are going to be headed in the right direction.

Thursday, February 28, 2008

Fortune Cookie Says

Your happy heart brings joy and peace where there is none.

Monday, February 18, 2008

arrrggg -- strlcat II, the solaris files

So after my adventures with strlcat on Friday, I tried out my tests on Solaris today. And sadly I must report that it works the same as on Cygwin and the Mac. Though (fortunately ?) our implementation also works the same. So at least everything fails the specification consistently.

Saturday, February 16, 2008

Fortune Cookie Says

Wise men learn more from fools than fools the wise.

Friday, February 15, 2008

arrrggg -- strlcat

So, I am using strlcat and strlcpy in the project I'm working on. We had to implement our own version of these for Linux since it does not come with them. Well today I had a frustrating day of writing unit tests for this code. I developed the tests in cygwin as I wanted to validate that the native implementation and our implementation work the same. However, I find that cygwin doesn't work according to the available doc. So I went home to try it on my Mac and Linux boxes. So what do I find...

Well lets start with the return values section of Apple's man page:
RETURN VALUES
The strlcpy() and strlcat() functions return the total length of the string they tried to create.

Sun even documents this in better detail. The return value is:
min(sz, strlen(dest)) + strlen(src)

Well that seems simple enough, so what are the results of the tests. The dest in my tests is a 5 character array. I will represent the contents of this as a string literal.

0 == strlcat( "", "", 0 ) yes it does

3 == strlcat( "", "bbb", 0 ) no it returns 0

2 == strlcat( "aa", "", 5 ) yes it does

2 == strlcat( "", "bb", 5 ) yes it does

4 == strlcat( "aa", "bb", 5 ) yes it does

5 == strlcat( "", "abcde", 5 ) no this returns 4

9 == strlcat( "aaaa", "abcde", 5 ) no this returns 4

? == strlcat( "aa", "abcd", 1 )
Well what should the last one return. The case of dest not having a NULL within size characters is an error condition. I think that this should return 5 (size + strlen(src) ). But, no matter what I think, it actually returns 0.

So of course this means that the example that they give in the man page does not work.
if ( strlcpy( pname, dir, sizeof( pname ) ) >= sizeof( pname ) )
goto toolong;

As we can see from my example above, strlcat( "aaaa", "abcde", 5) only returns 4. And last I checked 4 is less than 5.

On Monday I'll have to try this out on Solaris to see what it says.

And don't get me started on the topic of the routines not validating their input. This is supposed to be a standard library, it should not cause a core dump when it receives bad input. Would it be so terribly difficult to return 0 and set errno if you got a NULL pointer?

Monday, February 11, 2008

Parsing Socket Connections With Flex and Bison

I developed a parser for my my current project using Flex and Bison. The development work was done reading from stdin by redirecting files. However, the intent of the system was for it to be run by inetd. Since inetd maps the accepted socket to stdin and stdout for you I didn't really foresee any problems. For testing and demo purposes I wrote a simple network version that accepted a connection and mapped it to stdin and stdout like inetd. However, I found that my program would block every time that it tried to write to the socket.

I traced the problem down to Flex but never came up with a reason for why it was happening. While looking into the problem I found the Lemon parser. The documentation for the Lemon parser mentioned reading from a socket into a buffer and then parsing that buffer. This was the key to solving the problem. My solution may seem to be a little convoluted but seems to work well. So here it is.

First I defined two functions, one that reads from a file descriptor and one that writes. These functions have prototypes of:

int parse_read(int fd, char *buff, int size, int *read )
int parse_write(int fd, char *buff, int size)


I then wrote a parser initialization routine parser_init with a prototype of.

void parser_init( int infd, int outfd, PARSE_READ inFunc, PARSE_WRITE outFunc)

The parser_init function will save the input and output file descriptors, the input and output call back functions, and initialize the buffer. Once this is done it calls the flex function yy_scan_string. I did create two default functions that are used if NULL is passed for the callback parameters. But allowing for different input and output functions to be used makes things like unit testing much easier.

I then wrote a yywrap function in my Bison input file. The yywrap function, which is called when flex's input buffer is empty, calls the read callback function to get more data. If the callback returns 0, success, yywrap calls yy_scan_string again.

This probably does not make too much sense until you see it in action. So hopefully this will help to illustrate how all of the pieces come together, connfd is our connected socket and the default read and write routines will be used.
  • The application calls parser_init( connfd, connfd, NULL, NULL )
    • parser_init sets the input and output file descriptors
    • parser_init sets the callbacks to the defaults
    • parser_init initializes the buffer and calls yy_scan_string
  • the application calls yyparse to start the parser
    • flex finds that the input buffer is empty and calls yywrap
      • yywrap calls the read callback.
        • the read callback reads from the socket and populates the buffer.
      • yywrap calls yy_scan_string
      • yywrap returns 0 to have flex continue, if there was data available.
And that is the process, which actually seems to work quite well. Except for the small memory leak, but that is another post.

Tuesday, February 5, 2008

Fortune Cookie Says

Your principles mean more to you than any money or success.