CSC 330 Lecture Notes Week 6

CSC 330 Lecture Notes Week 6
Operational Semantics of Imperative Programming Languages
Topics from Chapters 4 & 5 of the Book



  1. Basic semantics of imperative programming (book Section 4.1).
    1. In assignment 4, you're implementing the operational semantics of the EJay imperative programming language.
    2. What it means for a language to be imperative is that it acts like most of the languages you're likely familiar with, such as Java, C++, EJay, and Pascal.
    3. Namely, an imperative language has data variables in a program memory (what the book calls the program store or data store).
    4. The fundamental meaning of an imperative program is the value of its memory upon completion of execution.
    5. In the PascalInterpreter example (and in your EJayInterpreter) memory is concretely represented as the memory data field, which is an array of Objects.
    6. When interpreter execution is complete, the meaning of the interpreted program is defined concretely by its memory dump.
    7. For example, the meaning of the following EJay program
      /*
       * This is the simplest possible executable program.  The result should be a
       * value of 10 in location 0 of static memory.
       */
      
      int i;
      
      void main() {
          i = 10;
      }
      
      is that the variable i is assigned the value of 10, which is represented by the following post-execution memory dump
      
      Static Pool:
      Location 0: 10
      
      Stack: empty
      
    8. The preceding dump is based on the fact that variable i is assigned memory location 0.
    9. We'll be discussing details of memory address assignment and memory layout coming up.

  2. Naming and variables (Section 4.2).
    1. Variables are name with identifiers, the syntax of which is defined by the grammar of the programming language.
    2. Variables are associated with memory locations.
    3. In the concrete case of the interpreters we're writing for Pascal and EJay, the memory assignment is done during the parsing phase, where each variable is assigned a unique location, starting at 0.
    4. In the case of variables defined as function parameters and locals, the address is scope-specific, and relative to a location on the runtime stack.
    5. Details of the runtime stack component of memory are coming up (they're covered in Chapter 5 of the book).

  3. Elementary types, values, and expressions (Section 4.3).
    1. Elementary (aka, primitive, aka atomic) types are fundamentally the same in most programming languages.
    2. They generally include boolean, numeric, and character/string types.
    3. EJay's elementary types are typical -- int, float, string, boolean.
    4. Basic expression o
    5. The concrete operational semantics of expression evaluation are illustrated in the PascalInterpreter example, and will be fundamentally the same for your EJayInterpreter.
      1. To evaluate an expression, the top-level evalExpr method determines which expression operator is involved.
      2. It then dispatches to the appropriate sub-methods, e.g., evalPlus.
      3. The result of each expression evaluation method is a Value object, that contains the result of applying the operator to its operands.
      4. Values are returned be each expression evaluation method, and ultimately assigned to a variable in memory to produce a meaningful program result.
    6. The rules of expression evaluation for a particular language vary in strictness in terms of the types of operands that a particular operator can apply to.
      1. EJay has quite strict rules, as defined in the Assignment 4 writeup.
      2. Pascal has similarly strict rules.
      3. Other languages, notably C, have much more relaxed rules for expression evaluation, as discussed in the subsections of Section 4.3 of the book.
      4. We'll discuss these other forms of rules if time permits in class, but won't go into them in detail (i.e., they won't be a topic for the midterm).

  4. Semantics of programming language statements (Sections 4.4 and 4.5).
    1. Imperative programming languages have statements, that are executed for their effect on memory.
    2. As with basic data types and expression evaluation, the basic types of statements in programming languages are fundamentally the same.
    3. They include assignment, conditional, looping, and function call/return.
    4. EJay's statement types are quite typical -- assignment, if-else, while, and function call/return.
    5. The concrete operational semantics of expression evaluation are illustrated in the PascalInterpreter example, and will be fundamentally the same for your EJayInterpreter.
      1. To evaluate a statement, the top-level evalStmt method determines which kind of statement is being evaluated.
      2. It then dispatches to the appropriate sub-methods, e.g., evalAssmnt.
      3. The result of each statement evaluation is a (possibly) modified program memory
          Memory is possibly modified, since in the case of a conditional statement, execution of memory-changing statements may not occur, depending on the evaluation of the conditional expression.
        1. Assignment is the fundamental memory-changing statement; its concrete semantics are illustrated in the PascalInterpreter implementation of evalAssmnt, which we'll go through in detail in class.
      4. In contrast to expressions, statement evaluation does NOT produce a returned Value; the semantic effect of statements is on program memory.

  5. The structure of program memory (intro to Chapter 5, and Section 5.1).
    1. There are three areas of program memory-- static pool, stack, and heap.
      1. The static pool holds variables whose lifetime is the entire life of an executing program.
      2. The stacks holds activation records (aka stack frames), with storage for function parameters and local variables; the lifetime a variable in a stack activation records is only during execution of the function in whose scope the parameter or local is declared.
      3. The heap
    2. Concrete details or memory layout.
      1. All three memory areas can be laid out in one contiguous area, as shown in the book (Fig 5.1, pg. 120).
      2. In the Pascal and Ejay interpreters, we have a static pool and stack as in the book, but have the heap in a separate physically separate area of memory; this is because we're using Java's heap for array and struct memory storage, and not managing one of our own.

  6. Functions, locals, parameters, and the runtime stack (Section 5.2).
    1. As noted above, the runtime stack is used as the storage for function parameters and local variables.
      1. Each time a function is called, an activation record is pushed onto the stack.
      2. The size of the activation recored is determined when the parser builds the symbol table, by summing the size of memory necessary to store all parameters and locals.
      3. When the function exits, its activation record is popped off of the stack, thereby freeing its storage, and terminating the current lifetime of the storage for its parameters and locals.
      4. We'll discuss details of pushing and popping activation records coming right up.
    2. Concrete details of stack management.
      1. The book describes one model of stack management based on what are called static and dynamic links.
          The static link refers the static scoping context of the stack.
        1. The dynamic link refers to the dynamic execution context.
      2. In the concrete operational semantics of our Pascal and EJay interpreters, we use a different model of stack management, that is easier to implement than the book's.
      3. Specifically, in our interpreters, we use the following implementation models for the static and dynamic links illustrated in in Figures 5.2 - 5.4, on pages 123-124:
        1. static links are represented by the parent link structure in the passed-around symbol table
        2. dynamic links are represented the dynamically adjusting stack indices, in particular the tos and nextTos interpreter data fields.
      4. To use the book's approach, every activation record (aka, stack frame) needs to have two additional stack pointers (indices).
        1. In our implementation approach, we use the symbol table parent links link lexically enclosing scope contexts, which is exactly what the dynamic link is used for.
        2. We also use the symbol table to tell us how much to decrement the stack pointer when we leave a function, which is exactly what the dynamic link is used for.

  7. Tracing the execution sample-test-files/func.ej, funcs.ej, fact.ej.
    1. We'll now trace through the execution of some sample EJay programs that use functions, to see what's going on with memory and the runtime stack.
      1. The programs are in the "Additional Information" handout for Assignment 4.
      2. The programs are accessible online in the index page for Assignment 4, in the subdirectory sample-test-files.
      3. The output of running the programs is in the subdirectory sample-out-files.
    2. Here is the file sample-test-files/func.ej:
      /*
       * This is a very simple test program for a function call.  The result should
       * be a value of 3 in static memory location 0.  Test the program with
       * EJayInterpreterTest.java.
       */
      
      int i;
      
      void main() {
          f(3);
      }
      
      int f(int x) {
          i = x;
      }
      
    3. The incoming symtab and memory structures:
      StaticPool:                     Level 0 symtab:
        0 (i's address):                i, int, mem loc: 0
      Stack: empty (main?)              f, int,
                                          Formals: i
                                          Level 1 symtab:
                                            x, int, mem loc: 0
                                          Body: "i = x;"
      
    4. The five specific steps to call, execute, and exit a function (as noted in method comment for evalProcCall in the PascalInterpreter example):
      1. look up the function name in the symbol table
      2. push an activation record onto the stack
      3. bind actual parameters to formal parameters, in the callING environment
      4. evaluate the function body, in the callED environment
      5. pop the activation record and return the function value, if any
    5. We'll trace through these steps in detail during class.

  8. Design of EJayInterpreter.evalProcCall (aka,
    1. The PascalInterpreter has just a skeleton for the evalProcCall method, which is the analog of an evalFuncCall method in the EJay interpreter.
    2. The comment in the body of evalProcCall says that the implementation details are are up to you.
    3. To give you a more concrete idea of these details, here is a design sketch for the implementation of evalFuncCall and evalReturn methods in the EJayInterpreter.
      1. The indentation of method names indicates the method calling hierarchy.
      2. E.g., evalFuncCall calls evalFuncIdent, bind, and evalFuncBody, in that order.
      3. evalFuncIdent in turn calls lookup and error.
      4. To the right of each method is an indication of how big each method is, in terms of the number of lines of code, excluding comments.
      evalFuncCall            13 lines of code, excluding comments
        evalFuncIdent         13 lines, including 11 lines of error handling
          lookup
          error
        bind                  21 lines, including 8 of error handling
          bind1               11 lines, including 4 of error handling
            evalFormalLValue  4 lines, very similar to evalIdentLValue
              lookup
              LValue
            evalExpr
        evalFuncBody          8 excruciatingly elegant lines of code
          eval                the same 'ol thing
      
      evalReturn              2 stunningly simple lines of code
        eval
        ReturnValue
      
      class ReturnValue ------> RuntimeException
        Value
        ReturnValue(Value)
      
    4. Here are some observations about what each of these methods does, vis a via the five steps for executing a function:
      1. Step 1: Lookup function being called.
        1. This is done in evalFuncIdent.
        2. There is no real challenge here, just extract the called function's name from the parse tree and look it up.
        3. As the line count indicates, the bulk of the work is in error checking, which makes sure the called function exists, and that the identifier in the call is in fact a function, i.e., not a variable.
      2. Step 2: Push the activation record.
        1. Care must be taken here.
        2. There's not a lot of code, but you need to understand exactly what's going on.
        3. In the memory layout scheme we're using, with the initial top of stack immediately below the static pool, the tos index must always point at the top of the currently active functions activation record.
        4. So, when we initially push an activation record, what we do is push some space below the current function's activation record.
        5. Implementationwise, this is simply an arithmetic increment of the tos index, by the size of the callING function's activation record; this size is the value of memorySize data field in the callING function's symtab.
        6. The conceptually important point is that before and during parameter binding, we need two stack indices -- one for callING function's activation record, the other for the callED function's activation record.
        7. Suggest interpreter variable names for these indices are tos and nextTos
        8. What the initial push of an activation does is increment nextTos, but leaves tos where it is until binding is done.
        9. Since this is only a line or two of code, there is no separate method for it in the design sketch above; i.e., the code is directly in the body of evalFuncCall.
      3. Step 3: Bind actuals to formals.
        1. Actuals are evaluated (as r-values) in the callING environment (tos, and the callING function's symbol table).
        2. Formals are evaluated (as l-values) in the callED environment (nextTos, and the callED function's symbol table).
        3. Implementationwise, this involves a for loop in the bind; the loop the traverses the lists of actual and formal parameters in parallel.
        4. If at the end of the traversal there are any left over items in either list, it's an error.
        5. The bind1 method called by bind performs the binding for one formal/actual pair
          1. Semantically, this is the same as a regular assignment statement.
          2. I.e., the formal parameter variable is assigned the actual parameter expression value.
        6. The evalFormal method a conceptually important piece of the puzzle; it has the same semantics as evalIdentLValue (implemented in the Pascal example), but here nextTos is used as the stack index instead of tos. Also, there is no need to do error processing evalFormalLValue, since we know the formal is defined.
        7. The last conceptually important step of bind to tos to nextTos
          1. This completes the officially entry into the callED function's stack frame.
          2. This part is not shown as a method call in the design sketch, since it's just one line of code in bind.
      4. Step 4: Evaluate the function body
        1. Conceptually, and in the implementation, this is where things are simple and elegant.
        2. We grab function body from the symtab entry (FunctionEntry.body), which was put there during parsing.
        3. A particularly important thing to do is descend into the callED function's symbol table when we evaluate its body.
          1. I.e., in the specific example of calling function f from main, the symtab passed into evalFuncBody must be f's, not main's.
          2. When the body is finished being executed, symtab moves back to main's.
          3. If this is implemented properly, no explicit call to SymbolTable.ascend is necessary, since we use parameter passing to establish the symtab context for any called evalX method.
      5. Step 5: Pop the called function's activation record
        1. This is accomplished by a simple arithmetic decrement of tos.
        2. If things have been done properly at this point, we return the return value, if any, that comes back from

  9. Tracing the execution of sample-test-files/func-return.ej.
    1. The trick to implementing evalReturn is to use a Java throw.
    2. Conceptually what we need to do is return from the invocation of evalReturn directly back to the invocation of evalFuncBody.
    3. The problem is, there are potentially many function invocations still pending on the Java call stack.
    4. E.g., here's a picture of Java's call stack from inside the doReturn method (courtesy of Java's debugger jdb):
      
      waldorf% jdb EJayInterpreterTest func-return.ej
      
      > stop in EJayInterpreter.evalReturn
      
      > run EJayInterpreterTest sample-test-files/func-return.ej
      
        Breakpoint hit: EJayInterpreter.evalReturn(), line=452
      
      > where
        [1] EJayInterpreter.evalReturn (EJayInterpreter.java:452)
        [2] EJayInterpreter.evalStmt (EJayInterpreter.java:129)
        [3] EJayInterpreter.eval (EJayInterpreter.java:72)
        [4] EJayInterpreter.evalStmtList (EJayInterpreter.java:96)
        [5] EJayInterpreter.eval (EJayInterpreter.java:68)
        [6] EJayInterpreter.evalStmt (EJayInterpreter.java:135)
        [7] EJayInterpreter.eval (EJayInterpreter.java:72)
        [8] EJayInterpreter.evalFuncBody (EJayInterpreter.java:374)
        [9] EJayInterpreter.evalFuncCall (EJayInterpreter.java:316)
        [10] EJayInterpreter.evalExpr (EJayInterpreter.java:511)
        [11] EJayInterpreter.evalAssmnt (EJayInterpreter.java:150)
        [12] EJayInterpreter.evalStmt (EJayInterpreter.java:117)
        [13] EJayInterpreter.eval (EJayInterpreter.java:72)
        [14] EJayInterpreter.evalStmtList (EJayInterpreter.java:96)
        [15] EJayInterpreter.eval (EJayInterpreter.java:68)
        [16] EJayInterpreter.evalStmt (EJayInterpreter.java:135)
        [17] EJayInterpreter.eval (EJayInterpreter.java:72)
        [18] EJayInterpreter.evalFuncBody (EJayInterpreter.java:374)
        [19] EJayInterpreter.evalFuncCall (EJayInterpreter.java:316)
        [20] EJayInterpreter.evalStmt (EJayInterpreter.java:126)
        [21] EJayInterpreter.eval (EJayInterpreter.java:72)
        [22] EJayInterpreterTest.main (EJayInterpreterTest.java:35)
      
    5. As seen here, the active evalFuncBody method is sitting back up at stack frame 8.
    6. The way to get there form evalReturn, at stack frame 1, is to use a Java throw.
    7. The matching catch is in evalFuncCall.
    8. An implementation detail is to define our own class that extends Java's RuntimeException, named, say, ReturnValue.
    9. This class contains one Value data field, which is the evaluated value of the return expression.

  10. Array and struct data (Sections 5.4 through 5.6).
    1. Our interpreter memory is organized into "word-size" chunks.
      1. Since we're implementing in Java, a "word" is the size of a reference in Java, which is typically 16 to 64 bits, depending on the underlying computer architecture.
      2. All primitive EJay types will take one "word-size" chunk.
      3. Arrays and structs will take one word-size chunk for a reference, than enough heap storage to hold all of their elements.

  11. An array example -- array.ej.
    1. Consider this EJay program:
      int i;
      int[10] a;
      
      void main() {
          a[0] = 0;
          a[3] = 3;
          i = 3;
          print "a[i] = ", a[i];
      }
      
    2. Here's a memory dump (at end of program execution):
      i   0:  3
      a   1:  -----> 0: undef
                     1: undef
                     2: undef
                     3: 3
                     4: undef
                       ...
      
      1. Array storage is in Java's heap; it's allocated with "new Object[size-of array]", using, if you like, Memory.allocate(int size).
      2. An important question is when the piece of array storage gets allocated.
        1. On demand -- the first time the array is referenced as an l-value in an assignment.
        2. Before execution -- by traversing the decls section looking for array variable declarations.
        3. You can implement it either way in your EJayInterpreter, but I recommend on-demand.
    3. What about array type checking?
      1. Since array Values include a type, you use it to perform the necessary checking at runtime.

  12. Another array example -- arrays.ej
    1. Here's the EJay program
      int[10] a;
      int[10][10][10] a3;
      
      void main() {
          a[3] = 3;
          a3[3][3][3] = a[3];
      }
      
    2. Here's the on-demand memory layout:
      a   0:  ----------------------------------------------------> 0: undef
      a3  1:  -----> 0: undef                                       1: undef
                     1: undef                                       2: undef
                     2: undef                                       3: 3
                     3: -----------> 0: undef                          ...
                     4: undef        1: undef
                       ...           2: undef
                                     3: -----------> 0: undef
                                                     1: undef
                                                     2: undef
                                                     3: 3
                                                        ...
      

  13. Struct example -- struct.ej
    1. The EJay program:
      struct {int f1; float f2; string f3;} s;
      
      void main() {
          s.f1 = 1;
          s.f2 = 1.5;
          s.f3 = "abc";
      }
      
    2. The memory layout is effectively the same as an array:
      s  0: ----------> 0: 1
                        1: 1.5
                        2: "abc"
      
    3. Time of memory allocation has same alternatives as array.
    4. The interpretation of a dot operator uses the type tag in a struct Value as the "roadmap" of the struct.
      1. Here's a Value
        val  ------------> 0: 1
                           1: 1.5
                           2: "abc"
        
        type -------------> TypeNode.child1 -----> field list
                            TypeNode.symtab -----> symtab, field offsets
        

  14. Call-by-ref parameters.
    1. Consider the following ejay program, which could just as well be written in Java:
      int i,j;
      
      void main() {
          i = 10; j = 20;
          swap(i,j);
      }
      
      void swap(int x, int y) {
          int temp;
          temp = x;
          x = y;
          y = temp;
      }
      
    2. Despite the apparent intention of the swap method, it does not have the desired effect.
      1. At the end of the swap method's execution, global variables i and j still contain 10 and 20, respectively.
    3. EJay has no reference (or pointer) type, as do many other languages.
      1. In such languages, the desired effect of the could be implemented using pointers.
      2. E.g., here's a working C implementation of swap
        int i,j;
        
        void swap(int* x, int* y) {
            int temp;
            temp = *x;
            *x = *y;
            *y = temp;
        }
        
        int main() {
            i = 10;  j = 20;
            swap(&i, &j);
        }
        
    4. EJay does not have pointers like C.
    5. What it does have, which is also typical in a number of common programming languages (notably C++), is a call-by-reference parameter passing feature.
    6. Here's a working EJay implementation of swap
      int i,j;
      
      void main() {
          i = 10; j = 20;
          swap(i,j);
      }
      
      void swap(ref int x, ref int y) {
          int temp;
          temp = x;
          x = y;
          y = temp;
      }
      

  15. Pointers, memory leaks, and garbage collection (Sections 5.3 and 5.6).
    1. Since EJay has no explicit features for pointers, we won't be covering these topics for Assignment 4, nor will they be covered on the midterm.
    2. We'll discuss these issues further when we cover Lisp, starting in Week 8.