readme
changeset 0 fb772792efd1
child 67 08097122a922
equal deleted inserted replaced
-1:000000000000 0:fb772792efd1
       
     1 
       
     2 
       
     3 
       
     4   IEC 61131-3 IL and ST compiler
       
     5 
       
     6 
       
     7   The following compiler has been based on the
       
     8   FINAL DRAFT - IEC 61131-3, 2nd Ed. (2001-12-10)
       
     9 
       
    10 
       
    11   (c) 2003 Mario de Sousa
       
    12 
       
    13 
       
    14 
       
    15 ****************************************************************
       
    16 ****************************************************************
       
    17 ****************************************************************
       
    18 *********                                              *********
       
    19 *********                                              *********
       
    20 *********   O V E R A L L    A R C H I T E C T U R E   *********
       
    21 *********                                              *********
       
    22 *********                                              *********
       
    23 ****************************************************************
       
    24 ****************************************************************
       
    25 ****************************************************************
       
    26 
       
    27  The compiler works in 4(+1) stages:
       
    28  Stage 1   - Lexical analyser      - implemented with flex (iec.flex)
       
    29  Stage 2   - Syntax parser         - implemented with bison (iec.y)
       
    30  Stage 3   - Semantics analyser    - not yet implemented
       
    31  Stage 4   - Code generator        - implemented in C++
       
    32  Stage 4+1 - Binary code generator - gcc, javac, etc...
       
    33 
       
    34 
       
    35  Data structures passed between stages, in global variables:
       
    36  1->2   : tokens (int), and token values (char *)
       
    37  2->1   : symbol tables (defined in symtable.hh)
       
    38  2->3   : abstract syntax tree (tree of C++ classes, in absyntax.hh file)
       
    39  3->4   : Same as 2->3
       
    40  4->4+1 : file with program in c, java, etc...
       
    41 
       
    42 
       
    43  The compiler works in several passes:
       
    44  Pass 1: executes stages 1 and 2 simultaneously
       
    45  Pass 2: executes stage 3
       
    46  Pass 3: executes stage 4
       
    47  Pass 4: executes stage 4+1
       
    48 
       
    49 
       
    50  NOTE 1
       
    51  ======
       
    52  Note that stage 2 passes data back to stage 1. This is only
       
    53 possible because both stages are executed in the same pass.
       
    54 
       
    55 
       
    56 
       
    57  NOTE 2
       
    58  ======
       
    59   I (Mario) have a feeling that the abstract syntax may be
       
    60 considerably simplified without any drawbacks to semantic checking
       
    61 and code generation. I have nevertheless opted to keep as much 
       
    62 info as possible in the abstract syntax tree, in case it may become
       
    63 necessary further on.
       
    64  Once we start coding the next stages (semantic checking and code
       
    65 generation) I will get a better understanding of what is required
       
    66 of the abstract syntax tree. At that stage I will be better
       
    67 positioned to make a more informed decision on how best to structure
       
    68 the abstract syntax tree.
       
    69  For now, we play conservative and keep as much info as possible.
       
    70 
       
    71  
       
    72 
       
    73  NOTE 3
       
    74  ======
       
    75  It would be nice to get this parser integrated into the gcc
       
    76 group of compilers. We would then be able to compile our st/il
       
    77 programs directly into executable binaries, for all the processor
       
    78 architectures gcc currently supports.
       
    79  The gcc compilers are divided into a frontend and backend. The
       
    80 data structure between these two stages is called the syntax
       
    81 tree. In essence, we would need to create a new frontend that
       
    82 would parse the st/il program and build the syntax tree.
       
    83 Unfortunately the gcc syntax tree is not very well documented,
       
    84 and doing semantic checking on this tree would probably be a
       
    85 nightmare.
       
    86  We therefore chose to follow the same route as the gnat (ada 95)
       
    87 and cobol compilers, i.e. generate our own abstract syntax tree,
       
    88 do semantic checking on our tree, do whatever optimisation
       
    89 we can at this level on our own tree, and only then build
       
    90 the gcc syntax tree from our abstract syntax tree.
       
    91  All this may still be integrated with the gcc backend to generate
       
    92 a new gnu compiler for the st and il programming languages.
       
    93 Since generating the gcc syntax tree will probably envolve some
       
    94 trial and error effort due to the sparseness of documentation,
       
    95 we chose to start off by coding a C++ code generator for
       
    96 our stage 4. We may later implement a gcc syntax tree generator
       
    97 as an alternative stage 4 process, and then integrate it with
       
    98 the gcc toplevel.c file (command line parsing, etc...).
       
    99 
       
   100 
       
   101 
       
   102 ****************************************************************
       
   103 ****************************************************************
       
   104 ****************************************************************
       
   105 *********                                              *********
       
   106 *********                                              *********
       
   107 *********               S T A G E      1               *********
       
   108 *********                                              *********
       
   109 *********                                              *********
       
   110 ****************************************************************
       
   111 ****************************************************************
       
   112 ****************************************************************
       
   113 
       
   114 
       
   115 
       
   116 Issue 1
       
   117 =======
       
   118 
       
   119  The syntax defines the common_character_representation as:
       
   120 <any printable character except '$', '"' or "'"> | <escape sequences>
       
   121 
       
   122  Flex includes the function print_char() that defines
       
   123 all printable characters portably (i.e. whatever character
       
   124 encoding is currently being used , ASCII, EBCDIC, etc...)
       
   125 Unfortunately, we cannot generate the definition of
       
   126 common_character_representation portably, since flex
       
   127 does not allow definition of sets by subtracting
       
   128 elements in one set from another set (Note how
       
   129 common_character_representation could be defined by
       
   130 subtracting '$' '"' and "'" from print_char() ).
       
   131 This means we must build up the defintion of
       
   132 common_character_representation using only set addition,
       
   133 which leaves us with the only choice of defining the
       
   134 characters non-portably...
       
   135 
       
   136  In short, the definition we use for common_character_representation
       
   137 only works for ASCII character encoding!
       
   138 
       
   139 
       
   140 
       
   141 
       
   142 Issue 2
       
   143 =======
       
   144 
       
   145 We extend the IEC 61131-3 standard syntax to allow inclusion of 
       
   146 other files. The accepted syntax is:
       
   147 
       
   148    (*#include "<filename>" *)
       
   149 
       
   150 Note how this would be ignored by other standard complient compilers 
       
   151 as a simple comment!
       
   152 
       
   153 
       
   154 
       
   155 ****************************************************************
       
   156 ****************************************************************
       
   157 ****************************************************************
       
   158 *********                                              *********
       
   159 *********                                              *********
       
   160 *********               S T A G E      2               *********
       
   161 *********                                              *********
       
   162 *********                                              *********
       
   163 ****************************************************************
       
   164 ****************************************************************
       
   165 ****************************************************************
       
   166 
       
   167  Overall Comments
       
   168  ================
       
   169 
       
   170  
       
   171  Comment 1
       
   172  ---------
       
   173  We have augmented the syntax the specification defines to include
       
   174 restrictions defined in the semantics of the languages.
       
   175 
       
   176  This is required because the syntax cannot be parsed by a LALR(1)
       
   177 parser as it is presented in the specification. Many reduce/reduce
       
   178 and shift/reduce conflicts arise. This is mainly because the parser
       
   179 cannot discern how to reduce an identifier. Identifiers show up in
       
   180 many places in the syntax, and it is not entirely possible to
       
   181 figure out if the identifier is a variable_name, enumeration
       
   182 value, function block name, etc... only from the context in
       
   183 which it appears.
       
   184 
       
   185  A more detailed example of why we need symbol tables are
       
   186 the type definitions...  In definition of new types
       
   187 (section B 1.3.3) the parser needs to figure out the class of
       
   188 the new type being defined (enumerated, structure, array, etc...).
       
   189 This works well when the base classes are elementary types
       
   190 (or structures, enumeration, arrays, etc. thereof). It becomes
       
   191 confusing to the parser when the new_type is based on a previously
       
   192 user defined type.
       
   193 
       
   194 TYPE
       
   195   new_type_1 : INT := 99;
       
   196   new_type_2 : new_type_1 := 100;
       
   197 END_TYPE
       
   198 
       
   199  When parsing new_type_1, the parser can figure out that the
       
   200 identifier new_type_1 is a simple_type_name, because it is
       
   201 based on a elementary type without structure, arrays, etc...
       
   202  While parsing new_type_2, it becomes confused how to reduce
       
   203 the new_type_2 identifier, as it is based on the identifier
       
   204 new_type_1, of which it does not know the class (remember, at this
       
   205 stage new_type_1 is a simple identifier!).
       
   206  We therefore need to keep track of the class of the user
       
   207 defined types as they are declared, so that the lexical analyser
       
   208 can tell the syntax parser what class the type belongs to. We
       
   209 cannot use the abstract syntax tree itself to search for the
       
   210 declaration of new_type_1 as we only get a handle to the root
       
   211 of the tree at the end of the parsing.
       
   212 
       
   213  We therefore maintain an independent and parallel table of symbols,
       
   214 that is filled as we come across the type delcarations in the code.
       
   215 Actually, we ended up also needing to store variable names in
       
   216 the symbol table. Since variable names come and go out of scope
       
   217 depending on what portion of code we are parsing, we sometimes
       
   218 need to remove the variable names from the symbol table.
       
   219 Since the ST and IL languages only have a single level of scope,
       
   220 I (Mario) found it easier to simply use a second symbol table for
       
   221 the variable names that is completely cleared when the parser
       
   222 reaches the end of a function (function block or program).
       
   223 
       
   224 What I mean when I say that these languages have a single level
       
   225 of scope is that all variables used in a function (function block
       
   226 or program) must be declared inside that function (function block
       
   227 or program). Even global variables must be re-declared as EXTERN
       
   228 before a function may access them! This means that it is easy
       
   229 to simply load up the variable name symbol table when we start
       
   230 parsing a function (function block or program), and to clear it
       
   231 when we reach the end. Checking whether variables declared
       
   232 as EXTERN really exist inside a RESOURCE or a CONFIGURATION
       
   233 is left to stage 3 (semantic checking) where we can use the
       
   234 abstract tree itself to search for the variables (NOTE: semantic
       
   235 cheching at stage 3 has not yet been implemented, so we may yet
       
   236 end up using a symbol table too at that stage!).
       
   237 
       
   238  Due to the use of the symbol tables, and special identifier
       
   239 tokens depending on the type of identifier it had previously
       
   240 been declared in the code being parsed, the syntax was slightly
       
   241 changed regarding the definition of variable names, derived
       
   242 function names, etc... FROM for e.g.:
       
   243 variable_name: identifier;
       
   244 TO
       
   245 variable_name: variable_name_token;
       
   246 
       
   247  Flex first looks at the symbol tables when it finds an identifier,
       
   248 and returns the correct token corresponding to the identifier
       
   249 type in question. Only if the identifier is not currently stored
       
   250 in any symbol table, does flex return a simple identifier_token.
       
   251 
       
   252  This means that the declarations of variables, functions etc...
       
   253 were changed FROM:
       
   254 function_declaration: FUNCTION derived_function_name ...
       
   255 TO
       
   256 function_declaration: FUNCTION identifier ...
       
   257 since the initial definition of derived_function_name had been
       
   258 changed FROM
       
   259 derived_function_name: identifier;
       
   260 TO
       
   261 derived_function_name: derived_function_name_token;
       
   262 
       
   263 
       
   264 
       
   265 
       
   266  Comment 2
       
   267  ---------
       
   268  Since the ST and IL languages share a lot of common syntax,
       
   269 I have decided to write a single parser to handle both languages
       
   270 simultaneously. This approach has the advantage that the user
       
   271 may mix the language used in the same file, as long as each function
       
   272 is written in a single lanuage.
       
   273 
       
   274  This approach also assumes that all the IL language operators are
       
   275 keywords, which means that it is not possible to define variables
       
   276 using names such as "LD", "ST", etc...
       
   277 Note that the spec does not consider these operators to be keywords,
       
   278 so it means that they should be available for variable names! On the
       
   279 other hand, all implementations of the ST and IL languages seems to
       
   280 treat them as keywords, so there is not much harm in doing the same.
       
   281 
       
   282  If it ever becomes necessary to allow variables with names of IL 
       
   283 operators, either the syntax will have to be augmented, or we can 
       
   284 brake up the parser in two: one for ST and another for IL.
       
   285 
       
   286 
       
   287 
       
   288 /********************************/
       
   289 /* B 1.3.3 - Derived data types */
       
   290 /********************************/
       
   291 
       
   292 Issue 1
       
   293 =======
       
   294 
       
   295 According to the spec, the valid construct
       
   296 TYPE new_str_type : STRING := "hello!"; END_TYPE
       
   297 has two possible routes to type_declaration...
       
   298 
       
   299 Route 1:
       
   300 type_declaration: single_element_type_declaration
       
   301 single_element_type_declaration: simple_type_declaration
       
   302 simple_type_declaration: identifier ':' simple_spec_init
       
   303 simple_spec_init: simple_specification ASSIGN constant
       
   304 (shift:  identifier <- 'new_str_type')
       
   305 simple_specification: elementary_type_name
       
   306 elementary_type_name: STRING
       
   307 (shift: elementary_type_name <- STRING)
       
   308 (reduce: simple_specification <- elementary_type_name)
       
   309 (shift: constant <- "hello!")
       
   310 (reduce: simple_spec_init: simple_specification ASSIGN constant)
       
   311 (reduce: ...)
       
   312 
       
   313 
       
   314 Route 2:
       
   315 type_declaration: string_type_declaration
       
   316 string_type_declaration: identifier ':' elementary_string_type_name string_type_declaration_size string_type_declaration_init
       
   317 (shift:  identifier <- 'new_str_type')
       
   318 elementary_string_type_name: STRING
       
   319 (shift: elementary_string_type_name <- STRING)
       
   320 (shift: string_type_declaration_size <- /* empty */)
       
   321 string_type_declaration_init: ASSIGN character_string
       
   322 (shift: character_string <- "hello!")
       
   323 (reduce: string_type_declaration_init <- ASSIGN character_string)
       
   324 (reduce: string_type_declaration <- identifier ':' elementary_string_type_name string_type_declaration_size string_type_declaration_init )
       
   325 (reduce: type_declaration <- string_type_declaration)
       
   326 
       
   327 
       
   328  At first glance it seems that removing route 1 would make
       
   329 the most sense. Unfortunately the construct 'simple_spec_init'
       
   330 shows up multiple times in other rules, so changing this construct
       
   331 would mean changing all the rules in which it appears.
       
   332 I (Mario) therefore chose to remove route 2 instead. This means
       
   333 that the above declaration gets stored in a
       
   334 simple_type_declaration_c, and not in a string_type_declaration_c
       
   335 as would be expected!
       
   336 
       
   337 
       
   338 /***********************/
       
   339 /* B 1.5.1 - Functions */
       
   340 /***********************/
       
   341 
       
   342 
       
   343 Issue 1
       
   344 =======
       
   345 
       
   346  Due to reduce/reduce conflicts between identifiers
       
   347 being reduced to either a variable or an enumerator value,
       
   348 we were forced to keep a symbol table of the names
       
   349 of all declared variables. Variables are no longer
       
   350 created from simple identifier_token, but from
       
   351 variable_name_token.
       
   352 
       
   353  BUT, in functions the function name may be used as
       
   354 a variable! In order to be able to parse this correctly,
       
   355 the token parser (flex) must return a variable_name_token
       
   356 when it comes across the function name, while parsing
       
   357 the function itself.
       
   358 We do this by inserting the function name into the variable
       
   359 symbol table, and having flex return a variable_name_token
       
   360 whenever it comes across it.
       
   361 When we finish parsing the function the variable name
       
   362 symbol table is cleared of all entries, and the function
       
   363 name is inserted into the library element symbol table. This
       
   364 means that from then onwards flex will return a
       
   365 derived_function_name_token whenever it comes across the
       
   366 function name.
       
   367 
       
   368 In order to insert the function name into the variable_name
       
   369 symbol table BEFORE the function body gets parsed, we
       
   370 need the parser to reduce a construct that contains the
       
   371 the function name. That is why we created the extra
       
   372 construct 'function_name_declaration', i.e. to force
       
   373 the parser to reduce it, before parsing the function body,
       
   374 and therefore get an oportunity to insert the function name
       
   375 into the variable name symbol table!
       
   376 
       
   377 
       
   378 /********************************/
       
   379 /* B 3.2.4 Iteration Statements */
       
   380 /********************************/
       
   381 
       
   382 Issue 1
       
   383 =======
       
   384 
       
   385 For the 'FOR' iteration loop
       
   386 
       
   387   FOR control_variable ASSIGN expression TO expression BY expression DO statement_list END_FOR
       
   388 
       
   389 The spec declares the control variable in the syntax as
       
   390 
       
   391   control_variable: identifier;
       
   392 
       
   393 but then defines the semantics of control_variable (Section 3.3.2.4)
       
   394 as being of an integer type (e.g., SINT, INT, or DINT).
       
   395 Obviously this presuposes that the control_variable must have been
       
   396 declared in some 'VAR .. VAR_END' construct, so I (Mario) changed
       
   397 the syntax to read
       
   398 
       
   399   control_variable: variable_name;
       
   400 
       
   401 
       
   402 
       
   403 
       
   404 
       
   405 **************************************************************************
       
   406 
       
   407   (c) 2003 Mario de Sousa