Monday, February 11, 2008

Parsing Socket Connections With Flex and Bison

I developed a parser for my my current project using Flex and Bison. The development work was done reading from stdin by redirecting files. However, the intent of the system was for it to be run by inetd. Since inetd maps the accepted socket to stdin and stdout for you I didn't really foresee any problems. For testing and demo purposes I wrote a simple network version that accepted a connection and mapped it to stdin and stdout like inetd. However, I found that my program would block every time that it tried to write to the socket.

I traced the problem down to Flex but never came up with a reason for why it was happening. While looking into the problem I found the Lemon parser. The documentation for the Lemon parser mentioned reading from a socket into a buffer and then parsing that buffer. This was the key to solving the problem. My solution may seem to be a little convoluted but seems to work well. So here it is.

First I defined two functions, one that reads from a file descriptor and one that writes. These functions have prototypes of:

int parse_read(int fd, char *buff, int size, int *read )
int parse_write(int fd, char *buff, int size)


I then wrote a parser initialization routine parser_init with a prototype of.

void parser_init( int infd, int outfd, PARSE_READ inFunc, PARSE_WRITE outFunc)

The parser_init function will save the input and output file descriptors, the input and output call back functions, and initialize the buffer. Once this is done it calls the flex function yy_scan_string. I did create two default functions that are used if NULL is passed for the callback parameters. But allowing for different input and output functions to be used makes things like unit testing much easier.

I then wrote a yywrap function in my Bison input file. The yywrap function, which is called when flex's input buffer is empty, calls the read callback function to get more data. If the callback returns 0, success, yywrap calls yy_scan_string again.

This probably does not make too much sense until you see it in action. So hopefully this will help to illustrate how all of the pieces come together, connfd is our connected socket and the default read and write routines will be used.
  • The application calls parser_init( connfd, connfd, NULL, NULL )
    • parser_init sets the input and output file descriptors
    • parser_init sets the callbacks to the defaults
    • parser_init initializes the buffer and calls yy_scan_string
  • the application calls yyparse to start the parser
    • flex finds that the input buffer is empty and calls yywrap
      • yywrap calls the read callback.
        • the read callback reads from the socket and populates the buffer.
      • yywrap calls yy_scan_string
      • yywrap returns 0 to have flex continue, if there was data available.
And that is the process, which actually seems to work quite well. Except for the small memory leak, but that is another post.

No comments: