PDF Print E-mail
Another one method of strings processing and parsing by the automatic device with final number of conditions is described. C language realization for Microchip PIC18 microcontrollers is resulted. (Environment of development: MPLab + HiTech C).

author: Dmitry B. Astrahantsev
information: www.robosoft.info
published: 25.05.2007

Introduction

Frequently there is a necessity of processing of strings various control-managing devices. It is impossible to assert, that binary formats of the data have become in the past, however to the human much more clear are string-liked representation of the information. Personal computers already operate with wide distributed strings formats: XML, HTML and so forth.

Not exception are also microcontrollers which even more often should provide an exchange and processing of the information in text format. Microchip PIC16 series of microcontrollers processing of strings was rather labour-consuming. But in 18 series simple text processing can be written with Assembler/C compilers.

If architect the code of program of text processing, in view of depends between strings in text and analyze symbol-by-symbol each of possible chars, the final code will be difficult for understanding and difficult for new text variation update. Addition of the analysis of each new string will occupy too much time, updating of the program for realization of new functions will be labour-consuming.

In the considered cases it is convenient to construct the string parser. Parser accept on an input of string and analyzing them on beforehand defined patterns. The parser changes the internal state depending on input string and allocate variables into internal arrays. All that should be changed in a source code are patterns!

Algorithm

The algorithm of functioning of the parser can be divided conditionally on 3 parts:

  • Initialization of the parser
  • Analysis of incoming strings and internal condition modification
  • Reading of internal states (internal values) of the parser

Consider the listed items more detail:

Initialization

Initialization should include reading and the analysis of all patterns for known string lines, with reservation of a memory for the recognized variables described inside patterns. The description of a pattern contains the symbolical description of names of the variables listed through a certain token (in an example - ","). In other words - initialization means the analysis of patterns, search in them of the string literals describing names of variables and allocation of memory for values of these variables.

After performance of initialization the parser will be set up on receiving / transfering of text expressions. Theoretically, patterns can be established after initialization.

Analysis of incoming strings and internal condition modification

The analysis means search of a corresponding pattern, the analysis of value of incoming string line and copying it into the internal memory of parser which describes internal state.

Reading of internal states (internal values) of the parser

Reading of internal condition can be provided by three various ways:

  • By reading of value of internal line variable according its number (but for this purpose it is necessary to know corresponding number of a line)
  • By reading of value of internal line variable according its name
  • Filling of the specified pattern containing besides tokens(dividers) names of processable variables, according to their values

Realization

The resulted source code is written for MPLab environment and compiler C-HiTech.

Section of variables and constants definitions:

#include ;
#include ;

//void index
#define NOT_MATCH_INDEX 255

//count of used patterns
#define PATTERNS_COUNT 2

//count of used variables
#define VALUES_COUNT 19

//maximum count of variables in pattern
#define MAX_VALUES_IN_PATTERN 14

//maximum length of string name
#define STRING_NAME_LENGTH 5

//maximum length of string value
#define STRING_VALUE_LENGTH 10

//patterns
const char * const patterns[] =
{          
           "$GPGGA,time,lon,lonSN,lat,latWE,fact,sat,HDOP,alt,units,xz1,xz2,xz3,chsum",
           "$GPRMC,time,wrn,lon,lonSN,lat,latWE,spd,dirct,date,mvar,mvard,chsum"
};

Parser Initialization Function:

//adding variable, with describe internal state of parser
void AddNewValue(far char * name, far unsigned char * nameNum, char patternNum)
{
   char indexValue; //index of adding variable
   char k;
   char count;
   char * str0;
   char * str1;

   indexValue = NOT_MATCH_INDEX;
   count = *nameNum;
   if(count>=VALUES_COUNT)
   count = VALUES_COUNT-1;
  
   //The analysis of presence at the list of names just the added line
   //If such line already exists in the list - reduce the counter of names on 1
   //(ignoring dublicates)
   for(k=0; kchar *) names[k];
       str1 = (char *) name;
       if(strcmp(str0, str1)==0)
       {

          //The variable met earlier - we are interested with an old index
          indexValue = k; 
          break;
       }
   }
    
   if(indexValue==NOT_MATCH_INDEX)
   {
       if(*nameNum//to
          str1 = (char *) name;            //from
          strcpy(str0, str1);
          *nameNum = (*nameNum) + 1;
       }
   }
   
    memset(name, 0, STRING_NAME_LENGTH);
  
    //add index of pattern, in which met variable
    for(k=0; kif(values_num[patternNum][k]==NOT_MATCH_INDEX)
        {
            values_num[patternNum][k] = indexValue;
            return;
        }
    }
}


//Parser Initialization
void InitializeAutomate()
{
    unsigned char nameNum=0;  //pointer to searched name
    unsigned char i; //index variable (pattern number)
    unsigned char tokenNum;
    unsigned char j; //index variable (char index in pattern)
    const char * pattern; //pointer to current pattern
    int patternLength; //length of current pattern
    char charIndex; //index of the char in string
    char name[STRING_NAME_LENGTH]; //adding line name

 

    //Array initialization:
    //Numbers of variables under numbers of patterns (need for fast search)
    for(i=0; ifor(j=0; j//void index
        }
    }

    //search all variables and initialize array of its names:
    charIndex = 0;
    for(i=0; i|// current token index (all chars before first token defines pattern name)
        if(i>0)
        {
            //add new variable if next pattern is achieved
            AddNewValue(name, &nameNum, i-1);
            charIndex = 0;
        }
 
       //scan all chars in pattern:
       pattern = (const char *) patterns[i];
       patternLength = strlen(pattern);
       for(j=0; jif(*pattern==token)
           {
               if(tokenNum>0)
               {
                   //add new variable
                   //when new token was reached
                   AddNewValue(name, &nameNum, i);
                   charIndex = 0;
               }
               tokenNum++;
               pattern++;
               continue;
           }

           if(tokenNum>0)
           {
               //add new char into internal name of variable
               name[charIndex] = *pattern;
               charIndex++;
           }
       }

       pattern++;
   }
   AddNewValue(name, &nameNum, PATTERNS_COUNT-1);
}

Function of condition state setting (strings analyze function):

//new string incoming

//which change internal state
//if corresponds to any known pattern
unsigned char NewSentence(far char * sentence)
{
   char i; //index variable (pattern index)
   char j; //index variable (char index)
   const char * pattern; //pointer to current pattern
   int patternLength; //length of current pattern
   far char * sentenceTemp; //pointer to received string
   int sentenceLength; //length of current pattern
   char indexToken;
   char valueNumNum; //index of variable index in array of indexes values_num[]
   far char * value; //pointer to variable value

   sentenceTemp = (far char *) sentence;
  
   for(i=0; i//check of pattern name:
       //scanning of all pattern chars to first token
       indexToken = NOT_MATCH_INDEX;

       pattern = (const char *) patterns[i];
       patternLength = strlen(pattern);
       for(j=0; jif(*pattern==token)
           {
               //next token
               indexToken = j;
               break;
           }
           if(*pattern!=*sentenceTemp)
           {
               //equal=false;
               break;
           }
           pattern++;
           sentenceTemp++;
       }
       if(indexToken>0)
       {
           //analize searched pattern:
           sentenceTemp = (far char *) sentence;
           sentenceLength = strlen(sentenceTemp);
           sentenceTemp = sentenceTemp + indexToken + 1;

           valueNumNum = 0; //index of variable index in array of indexes values_num[]
           value = (far char *) values[values_num[i][valueNumNum]];
           memset(value, 0, STRING_VALUE_LENGTH); //equate to an empty string
    
           for(j=indexToken+1; jif(*sentenceTemp==token)
               {
                   valueNumNum++;
                   value = (far char *) values[values_num[i][valueNumNum]];
                   memset(value, 0, STRING_VALUE_LENGTH); //equate to an empty string
               }
               else
               {
                   *value = *sentenceTemp;
                   value++;
               }
               sentenceTemp++;
           }
           return 1; //Have changed values of variables according to a pattern
       }
   }
   return 0; //no one variable has not been changed
}

Function of reading of internal states of the parser:

//Returns the reference to value of a variable according to its number
far char * GetValueByNum(char num)
{
    return (far char *) values[num];
}

 

//Returns the reference to value of a variable according to its name
far char * GetValueByName(far char * nameRequested)
{
    char i;
    far char * name;

    for(i=0; ifar char *) names[i];
        if(strcmp(nameRequested, name)==0)
        {
            return (far char *) values[i];
        }
    }
    return 0; //variable not found
}

Example of using:

char buffer[] = "$GPGGA,092204.999,4250.5589,S,14718.5084,E,1,04,24.4,19.7,M,,,,0000*1F";
far char * value = "N/A";
InitializeAutomate();
NewSentence(buffer);
value = GetValueByNum(0);
value = GetValueByName("lon");

Conclusion

The article describe simple and at the same time accessible parser of text. Various patterns allow to construct the analyzer of strings of various application. It is necessary to note, that realization of the program imposes the certain restrictions on count of patterns, count of variables in patterns, and also symbols in names and values of variables, but for the majority of tasks is quite comprehensible. The count of processable patterns is limited as well to accessible memory of the microcontroller.

It is necessary to take into account also, that algorithms of searching used in methods are not optimized (linear search) and at increase of analyzed elements count time of search accordingly grows.

The literature and references:

  1. Myke Predko E. PIC-microcontroller Reference - McGraw-Hill, 2006.
  2. Technical reference of ANSI C.