CSC 330 Lecture Notes Week 4

CSC 330 Lecture Notes Week 4
Type Systems and Semantics

Relevant reading: Chapter 3 of the book.
Some further highlights of tree-building actions in pascal.cup:
1. Look closely the procdecl; it avoids building an extra dummy TreeNode for just the prochdr by allocating the whole proc node in the lower prochdr rule instead of the procdecl
2. Check out the details of the expr and *op, in particular the use of %prec; this CUP declaration forces the precedence of a rule alternative to be that of a specific terminal symbol.
3. Check out last four rules; these are the immediate interface between the parser and the lexer.

What is a symbol table?

It is a form of lookup table that stores semantic information about symbols declared in a program.
Key semantic aspects for variables and parameters are the type and memory location.
Key semantic aspects for functions are the signature, code body, and local scope.
The design of the SymbolTable class is intended to support these semantics.
For further discussion, see the javadoc commentary for SymbolTable and its subsidiary classes.

Here is a (rough) UML diagram for the symtab classes:

SymbolTable <>---------------------------------------* SymbolTableEntry
-----------                                            ----------------
parent                                                 String name
entries                                                TreeNode type
level                                                  ----------------
-----------                                            SymbolTableEntry
SymbolTable(int)                                       SymbolTableEntry(
SymbolTable newLevel(FunctionEntry fe, int size)         String name, TreeNode type)
SymbolTableEntry lookup(String name)                   toString
SymbolTableEntry lookupLocal(String name)                   ^
boolean enter(SymbolTableEntry)               |-------------|----|
SymbolTable ascend()                          |                  |
SymbolTable descend(String name)              |                  |
void dump(SymbolTable st)               VariableEntry       FunctionEntry
String toString(int level)              -------------       -------------
                                        boolean isRef       TreeNodeList formals
                                        int memoryLoc       TreeNode body
                                                            SymbolTable scope

A simple Pascal/EJay program, and its symtab.

program
    var i,j,k: integer;
begin
    i := j + k * 10;
end.



int i,j,k;
void main() {
    i = j + k * 10;
}

Here's a dump of what the Pascal parser outputs:

PROGRAM
  BEGIN
    VAR
      i
        ;
      j
        ;
      k
      IDENT
        integer

    ASSMNT
      i
      PLUS
        j
        TIMES
          k
          10
      ;


Level 0 Symtab Contents:
  Symbol: i, Type: IDENT, is ref: false, mem loc: 0
  Symbol: k, Type: IDENT, is ref: false, mem loc: 0
  Symbol: j, Type: IDENT, is ref: false, mem loc: 0

A more detailed data-structures picture was presented in class on the board; you should be able to recreate such a data structure picture from a symbol table dump.

Adding semantic actions to build a symbol table.

Look at pascal.cup and figure out where symtab-building actions should go.

Here's a likely hot spot:

vardecl         ::= VAR vars:vs COLON type:t
                        {: RESULT = new TreeNode2(sym.VAR, vs, t);
                           parser.enterVars(vs, t); :}
                ;

vars            ::= var:v
                        {: RESULT = new TreeNodeList(v, null); :}
                | var:v COMMA vars:vs
                        {: RESULT = new TreeNodeList(v, vs); :}
                ;

var             ::= identifier:i
                        {: RESULT = i; :}
                ;

Here's the code for enterVars:

    protected void enterVars(TreeNodeList vars, TreeNode type) {
        TreeNode node;
        TreeNodeList rest;
        boolean done  = false;
        for (node = vars.node, rest = vars.siblings; !done; ) {
            symtab.enter(new VariableEntry(
                (String) (((LeafNode) node).value), type, false, 0));
            if (rest == null) {
                done = true;
            }
            else {
                node = rest.node;
                rest = rest.siblings;
            }
        }
    }

Now consider a Pascal program with procedures, its symtab, and symtab-building actions.
1. See the class comment for SymbolTable.java for an example picture of such a table.
2. Here, in the nested symbol table case, things get a bit tricky with the semantic actions.
3. A key aspect of mastering the trickery is always to remember that a CUP parser works left-to-right, bottom-up; in particular, the semantic action in a higher rule does not execute until AFTER all of semantic actions of its RHS constituents have executed, in left-to-right order.
4. In light of this, consider the following approach to allocating a new symbol table for a procedure, in which its formals and locals will be entered:
```
procdecl : ph:prochdr SEMI b:block
               {: RESULT = ph; RESULT.child4 = b;
                  symtab = symtab.newLevel(
                      new FunctionEntry(ph.child1.value, ph.child3,
                          ph.child2, b, null), PROC_SYMTAB_SIZE); :}
```
5. Does this work?
  1. The answer is NO.
  2. The reason is because the action comes too late for the formals and data decls in the block to go in the correct table.
  3. Think about this.
6. Instead, do this:
```
procdecl ::= ph:prochdr ';' b:block
                {: RESULT = ph; RESULT.child4 = b;
                   parser.entry.body = b;
                   symtab = symtab.ascend; :}

prochdr  ::= PROC identifier:i L_PAREN formals:fs R_PAREN COLON ident:rt
                {: RESULT = new TreeNode4(sym.PROC, i, fs, rt, null); :}
                   symtab = symtab.newLevel(parser.entry =
                       new FunctionEntry(i.value, rt, fs, null, null)
                               parser.PROC_SYMTAB_SIZE); :}
```
  1. What's happening here is that the action associated with the prochdr rule fires before the block element in the RHS of the procdecl rule.
  2. In this way, when the variables declared in the block are entered into the symbol table, they go into the correct table, which is that allocated for the function, rather than into the global table as in the incorrect case.
7. What does this mean for EJay?
  1. Probably the easiest approach to building function symbol tables is to restructure the EJay CUP grammar to include the equivalent of the Pascal prochdr rule; this will allow a new-level symbol table to be constructed before the function formals and block rules reduce.
  2. Another approach is to wait to enter formals and local variables until after the rules for these constructs have reduced; this will entail traversing the parse trees for these constructs, similar to what is done in the enterVars helper method.

Compilation and execution details.

On falcon/hornet, you need these four items on your CLASSPATH, in addition to what may already be there:
1. /Users/gfisher/classes/330/assignments/3/support-files/a3-support.jar
2. /Users/gfisher/pkg/java_cup_v10k
3. /Users/gfisher/pkg/jflex/lib/JFlex.jar
You may have already set up aliases for the cup and jflex, but it can't hurt to have the last two on your classpath.

Here's a sample run that builds the lexer, builds the parser, and runs the test program.

********************  Running JFlex  ********************

Reading "pascal.jflex"
Constructing NFA : 194 states in NFA
Converting NFA to DFA :
86 states before minimization, 81 states in minimized DFA
Old file "PascalLexer.java" saved as "PascalLexer.java~"
Writing code to "PascalLexer.java"


********************  Running Cup  ********************

Opening files...
Parsing specification from standard input...
Checking specification...
Warning: Terminal "UNY_PLUS" was declared but never used
Warning: Terminal "UNY_MINUS" was declared but never used

Building parse tables...
  Computing non-terminal nullability...
  Computing first sets...
  Building state machine...
  Filling in tables...

*** Shift/Reduce conflict found in state #77
  between ifstmt ::= IF expr THEN stmt (*)
  and     ifstmt ::= IF expr THEN stmt (*) ELSE stmt
  under symbol ELSE
  Resolved in favor of shifting.

  Checking for non-reduced productions...
Writing parser...
Closing files...
------- CUP v0.10k Parser Generation Summary -------
  0 errors and 3 warnings
  41 terminals, 33 non-terminals, and 70 productions declared,
  producing 121 unique parse states.
  2 terminals declared but not used.
  0 non-terminals declared but not used.
  0 productions never reduced.
  1 conflict detected (1 expected).
  Code written to "PascalParser.java", and "sym.java".
---------------------------------------------------- (v0.10k)


********************  Running the Test Program  ********************

PROGRAM
  BEGIN
    VAR
      i
        ;
      j
        ;
      k
      IDENT
        integer

      ;
    VAR
      x
        ;
      y
        ;
      z
      IDENT
        real

    ASSMNT
      i
      10
      ;
    ASSMNT
      j
      20
      ;


Level 0 Symtab Contents:
  Symbol: z, Type: IDENT, is ref: false, mem loc: 0
  Symbol: y, Type: IDENT, is ref: false, mem loc: 0
  Symbol: x, Type: IDENT, is ref: false, mem loc: 0
  Symbol: i, Type: IDENT, is ref: false, mem loc: 0
  Symbol: k, Type: IDENT, is ref: false, mem loc: 0
  Symbol: j, Type: IDENT, is ref: false, mem loc: 0

Note the use of the "-parser PascalParser" command-line arg; this renames the default CUP output from "parser.java" to "PascalParser.java".

Debugging a cup-built Java program.
1. OK, I have a bug in my parser, which I put in a CUP file pascal.cup.buggy and in the generated output file PascalParserBuggy.java.
2. Here are the errors I get:
```
PascalParserBuggy.java:696: cannot resolve symbol
symbol  : variable child1
location: class TreeNode
                 RESULT = op; op.child1 =
                                ^
PascalParserBuggy.java:697: cannot resolve symbol
symbol  : variable child2
location: class TreeNode
                           e1; op.child2 = e2;
                                 ^
2 errors
```
3. What's up, and how do I fix them?
4. Follow these steps
  1. Open the .java file and goto the listed line.
  2. Search up from there for a line of the form
```
          case 42: // expr ::= expr relop expr
```
  3. Open the .cup file and scroll down to that rule, and look for the Java source line listed in the error message.
  4. Now the debugging starts, which in this case has to do with not using the correct Java type for the non-term relop -- TreeNode instead of TreeeNode2.
What is semantics? (Chapter 3 intro)
1. It's what a program means.
2. In contrast to syntax, which is how a program is grammatically structured.
3. Types of semantic definition:
  1. operational -- a running compiler or interpreter
  2. axiomatic -- for proving programs correct
  3. denotational -- for abstractly defining meaning
Introduction to type systems. (3.1)
1. How types are associated with names in a PL.
2. PLs can be statically or dynamically typed.
3. PLs can be strongly or weakly typed.
Formalizing types (3.1.1++)
1. Primitive data domains of EJay -- int, float, string, boolean,
2. Composite data domains of EJay -- array, struct
3. Formally, the following four constructs are used to represent data domains in PLs:
  1. list domains are homogeneous compositions
    1. arrays in EJay and most other PLs
    2. lists in Lisp and Scheme
  2. product domains are heterogeneous comp'ns
    1. structs in EJay, C, C++
    2. records in Pascal
    3. class data members in Java, C++, C#
  3. sum domains are one-of comp'ns
    1. not in EJay
    2. union or enum in C, C++
    3. variant records in Pascal
  4. function domains represent functions, and other PL functional abstractions.
    1. not in EJay
    2. procedure type in Pascal
    3. function pointers in C, C++
    4. Method type in Java
4. We'll use an extension of book's notation of "[...]" for arrays and "{<i,t>, ... }" for structs.
Type checking in Jay and EJay (3.1.2)
1. Process of ensuring program meets type rules.
2. Rules for EJay include:
  1. Var and function idents must be unique within a scope.
  2. Within a scope, vars must be declared with a unique type.
  3. Expression types are determined based on operand types.
  4. Type of designator in LHS of assignment statement must be same as expr on RHS.
  5. Type of test in IF and WHILE must be boolean.
3. Most languages, including EJay, are strongly and statically typed.
4. C is an example of a weakly statically typed language.
5. Lisp is an example of a weakly dynamically typed language.
Semantic domains (3.2)
1. Semantic domains for a PL are an environment and a memory (a.k.a., store).
  1. environment maps identifiers to type and memory addresses (an extension of the book's def)
  2. store maps addresses to values
2. A memory-mapping-only environment for Jay (page 56):
  gamma = {<i, 154>, <j, 155>}
  mu = {}
3. An extended type-mapping and memory-mapping environment for EJay:
  gamma = {<i, INT, 154>, <j, INT, 155>}
  mu = {}
4. In the book, the preceding extended environment is represented as a pair of mappings,
  1. A type mapping
    tm = {<i, INT>, <j, INT>}
  2. And a memory-mapping environment
    gamma = {<i, 154>, <j, 155>}
  3. All the extended notation does is unify these two types of mapping.
5. A further extension of the book's environment model is the inclusion of scoping.
  1. E.g., consider the following EJay program:
```
int i,j,k;
float x,y;
int f(float x, string s) {
    boolean b1,b2;
    int[10] a;
    float z;
    // ...
}
int[10] g() {
    int x,y;
    struct {int i; float j;} s;
    float z;
    // ...
}
```
  2. The scope-extended environment for the preceding program is:
    
    gamma= {<i, INT, 0>,
    
    <j, INT, 1>
    <k, INT, 2>
    <x, FLOAT, 3>
    <y, FLOAT, 4>
    
    <f, INT,
    {<x, FLOAT, 0>, <s, STRING, 1>},
    {<b1, BOOLEAN, 2>, <b2, BOOLEAN, 3>,
    <a, INT ARRAY[10], 4>, <z, FLOAT, 5>}
    
    <g, INT,
    {},
    {<x, INT, 0>, <s, INT, 1>,
    <s, STRUCT {<i, INT, 0>, <j FLOAT, 1>}, 2>
    <z, FLOAT, 3>}
    }
6. The preceding environment model is a unification of what the book represents in a type map (Section 3.1.1) and an address-mapping environment (Section 3.2).
  1. Tuples enclosed in angle brackets are called bindings, which associate a name with semantic information.
  2. A data domain is a set of bindings, enclosed in braces.
  3. The outermost data domain is the environment.
  4. Structured type values, e.g., STRUCT in EJay, are nested data domains within the overall environment.
Symbol table as concrete def of environment.
1. A symbol table contains the same mappings as an abstract environment.
2. E.g., consider the symtab for the preceding abstraction.
State transformations (3.2)
1. The meaning, i.e., the semantics, of a program is defined as a series of state transformations.
2. Each state is represented as an environment and store.
3. For an initial simplified example, we'll consider a state to be collapsed into a simplified form, with bindings of the form <name, value>.
4. E.g, here is a program state with three vars x, y, and z assigned values 1, 2, and 3:
  sigma = {<x, 1>, <y, 2>, <z, 3>}
5. This is a collapsed version of an environment and memory that would look like this:
  gamma = {<x, INT, 0>, <y, INT, 1>, <z, INT, 1>}
  mu = {<0, 1>, <1, 2>, <2, 3>}
  This, in turn, is an abstract version of a concrete symbol table and memory:
  Level 0 Symtab Contents: Symbol: y, Type: INT, mem loc: 1 Symbol: z, Type: INT, mem loc: 2 Symbol: main, Type: VOID Symbol: x, Type: INT, mem loc: 0 Memory Dump: Location 0: 1 Location 1: 2 Location 2: 3
  And all of this traces to the following EJay and Pascal programs (which are equivalent semantically):
  
  EJay:int x,y,z; void main () { x = 1; y = 2; z = 3; }Pascal:program var x,y,z: integer; begin x := 1; y := 2; z := 3; end.
  So, getting back to state transformations, the meaning of every PL construct can be defined in terms of its state transition effects.
  E.g., the meaning of assignment can be defined in terms of its affect on a particular program state.
  1. Specifically, the meaning is:
    An assignment statement transforms a pre-assignment state into a post- assignment state where the bound value of the assigned variable is replaced by the value of the expression on the RHS of the assignment.
  2. There's a less bulky notation for the preceding mouthful, coming right up.
  Consider the preceding state
  sigma = {<x, 1>, <y, 2>, <z, 3>}
  and the assignment statement
  y = 2 * z + 3;
  1. The resulting transformed state is
    sigma = {<x, 1>, <y, 9>, <z, 3>}
  2. The value of 9 comes from computing the RHS expression in the pre-assignment state.
  In our simplified memory model, the semantics of assigning to a not-yet-bound variable has the effect of adding a binding to the pre-assignment state.
  1. E.g., consider the assignment statement
    W = 4
    given the preceding value of sigma
  2. The resulting transformed state is
    sigma = {<x, 1>, <y, 9>, <z, 3>, <w, 4>}
  The less bulky notation referred to above uses an overriding union operator, denoted U-bar
  1. U-bar is defined for two sets X and Y as follows:
    X U-bar Y = replace in X all bindings <x, v> whose first member matches a binding <x, w> in Y with <x, w>, and then add to X any additional bindings in Y that are not in X.
  2. E.g., if
    sigma₁ = {<x, 1>, <y, 2>, <z, 3>}
    and
    sigma₂ = <y, 9>, <w, 4>},
    then
    sigma ₁ U-bar sigma ₂ = {<x, 1>, <y, 9>, <z, 3>, <w, 4>}
Operational semantics (3.3)
1. Can be defined formally as in book.
2. For 330, we'll do it by implementing interpreters.
3. Can also be done by implementing a compiler.
4. In practice, formal operational defs are rare.
5. More typical is less-than-fully-formal def in English, then compiler.
6. Formal route is with denotational def on which compiler is based.
Axiomatic semantics (3.4)
1. Formal rules for program proof.
2. Covered in CSC 206 (a bit), but not here in 330.
3. Regarding book's "perspective" in Section 3.4.4:
  1. Points are essentially correct.
  2. As more tools become available, formal verification will be more widely used.
Denotational semantics (3.5)
1. Defines semantics in abstract mathematical terms.
2. The meaning of a program is defined as a function that maps program environments and stores into modified environments and stores.
3. E.g.,
  M: Assignment, Env, Store -> Env', Store'
  M("var = expr", e, s) = e', s'
  where e' = e and s' = s such that the value binding for var is replaced with the value of expr, i.e,
  
  s' = s U-bar {<var, M(expr, e, s)>}
Example: operational semantics for EJay assignment and exprs
1. Consider
```
int x,y,z;
z = x + 2*y;
```