February 16, 2006
In this week's assignment, you will construct a parser for the MALGOL language, using the LR parser generator built into drscheme. Then, you will tie together your lexer, your parser, and your solution to assignment 4 to produce a working evaluator that reads a file from disk and evaluates it.
In order to use the built-in parser generator, you will need to make some
superficial changes to the lexer. In particular, the parser generator only
works for tokens declared with its define-tokens and
define-empty-tokens forms. The first defines tokens with associated
information, like our string literals and identifiers. The second defines
tokens like left-paren and begin (that is, keywords) that have no associated
information.
After defining a group of tokens with, say, define-tokens, you can use
the procedure token-<name-of-token> to construct the given token.
Here's an example:
;; import the parser library:
(require (lib "lex.ss" "parser-tools")
(lib "yacc.ss" "parser-tools"))
;; Let's declare some tokens
(define-tokens my-tokens (ID STRING))
;; let's make an ID token:
(token-ID "my-identifier")
;; let's compare two of them:
(equal? (token-ID "my-id") (token-ID "my-id"))
Note that the choice of all-caps letters for token names makes the grammar rules (you'll write them later in this assignment) easier to read. In essence, we're capitalizing the terminals of the grammar.
Mercifully, the resulting structures may be compared with equal?, as the
example shows. This means that you can amend your test cases to work with the
new tokens. Please do so.
Also, you will have to add one new identifier-like keywords to your lexer,
the keyword "in".
If you have any questions about the meaning of define-tokens or
define-empty-tokens, please read the documentation.
Make sure this part is working before going on to the next part.
Next, formulate a parser for the MALGOL language, using the parser-tools library. The grammar for the parser is defined as follows:
program ::= procedure ... expression
procedure ::= PROCEDURE name params ; funtype BEGIN local-vardef ... IN expression END
params ::= name
| < params , params , ... >
funtype ::= < type -- type >
type ::= name
| < type , type , ... >
| funtype
expression ::= expression expression ...
| name
| < expression , expression , ... >
| NUMBER
| STRING
| TRUE
| FALSE
| IF expression THEN expression ELSE expression ;
| (expression)
| BEGIN local-vardef ... IN expression END
local-vardef ::= type params = expression ;
| procedure ;
name ::= ID
There are several things to be aware of when reading this grammar.
In this assignment, I use the words `procedure' and `function' interchangeably. The language keyword used is `procedure', but the result of evaluating a `procedure' form may be called either a `procedure' or a `function.'
All of the capitalized names refer to literal tokens, and all of the lower-case names refer to nonterminals defined in the grammar.
The symbols (such as the semicolon, the equals sign, the comma, the paren, etc.) are literal tokens.
Ellipses (that is, ``...'') are used to mean ``zero or more of the preceding item.'' This meaning is stretched somewhat by their use in comma-separated lists, where they are taken to mean not ``zero or more commas'', but rather ``zero or more items in a comma-separated list.''
The first expression rule may be confusing. This rule describes the
language form for application. That is, a function may be applied to an
argument just by placing them next to each other. For instance, the
expression "add1 3" would evaluate to 4, if the add1 function
were defined as below.
Application must be left-associative. That is, the expression "f x y
z" is parsed in the same way as "(((f x) y) z)". To put it another
way, the result of evaluating this function is achieved by applying the
function f to the argument x. The result of this must be a
function, which is applied to the argument y. The result of this must
be a function, which is applied to z.
The begin form includes a set of local definitions, including
both simple variables and procedure declarations. These should be parsed as
nested where clauses. So, for instance, an expression like
begin int x = 3; int y = 4; in + <x,y> end would be parsed as a where
clause with a binding for x whose body is a where clause with a
binding for y whose body is an application of plus.
Parenthesess are used for simple grouping, not (as in Scheme) to signal
an application. If you wanted to apply the function f to the result
of applying the function g to 3, for instance, you would have to write
f (g 3), because application is left-associative. Note that you are
free to sprinkle parens around any expression. In particular, you could also
write this expression as (((f ((g) (3))))), and it would mean the same
thing.
You must produce the function malgol-parser, which has the following
contract, header, and body:
;; malgol-parser : ( -> token) -> (cons (listof fundef) (cons Exp null))
(define malgol-parser
(parser
...
))
Your job is to fill in the (parser ...) part. That is, use the
parser form in the parser-tools library to define the parser. The output
of the parser must be a list containing two items: first, a list of
fundefs, and second, a single Exp. These refer to the data
definitions we used in Assignment 4. I will repeat those data definitions
here:
;; An Exp is one of : ;; (simple literals:) ;; - a number ;; - a boolean ;; - a string ;; (variable references: ) ;; - a symbol ;; (primitives: ) ;; - (make-prim Schemefun) (define-struct prim (impl)) ;; (tuples: ) ;; - (make-tuple (listof Exp)) (define-struct tuple (exps)) ;; (variable binding: ) ;; - (make-where exp Pattern Exp) (define-struct where (body pat rhs)) ;; (function definition: ) ;; - (make-fundef (symbol Pattern Exp)) (define-struct fundef (name params body)) ;; (function application: ) ;; - (make-funcall (Exp Exp)) (define-struct funcall (fun args)) ;; - (make-if-s exp exp exp) (define-struct if-s (test then else)) ;; a Pattern is one of ;; - a symbol ;; - (listof Pattern)
You may notice that this data definition makes no mention of types. For this
assignment, the parser should parse the types ...and then throw them away.
That is, the types don't make it into the parsed AST. Of course, they must
still be well-formed for your test cases to work. I suggest the use of the
identifier any to indicate the most general possible type.
Also, no parsed program will include instances of the prim expression.
That's all right, your programs can use the primitives defined in the
top-level-env. In order to work with our new language, you'll need to
make a few minor changes to the top-level-env. Section 4 describes these changes in more detail.
You will need to refer to the documentation for the parser collection. The most direct way to access this documentation is to choose Help>Help Desk, click on the link labeled ``Manuals'' (it's there, keep looking), and then on the link labeled ``Parser-tools collection''.
In order to import the parser-tools libraries, include this near the beginning of your file:
(require (lib "lex.ss" "parser-tools")
(lib "yacc.ss" "parser-tools"))
Also, you will likely be lost until you take a look at the examples provided
with the collection. You can find them inside the PLT folder, at
collects/parser-tools/examples. I suggest taking a look at the
``calc.ss'' file and running it to see how it works.
Much of the grammar provided above will translate line-by-line into rules of
the parser you write. The challenge will come in the rules that include
ellipses, since the parser does not provide a ... feature. Instead, you
will need to define rules (much like you did on assignment 5) to define exactly
how these sequences are to be parsed. The other tricky rule has to do with
parsing the expression expression ... form. You will almost certainly
need to add an extra non-terminal in order to get things to parse cleanly.
Also, keep in mind that application is to be parsed in a left-associative
manner.
I recommend using the (debug "filename") feature of the parser-generator
to understand what's going on, particularly when you run into conflicts. The
tables generated by the debug form are like the ones we used in class. They
define a set of states, and indicate what transition to make when a given token
appears.
Start with simple grammars, and build up to more complex ones. As always, write your test cases first. If you can't write the test case, you can't write the program.
All right, here are some examples.
procedure factorial x ; <any -- any>
begin in
if equal <x,0> then
1
else
* <x,factorial (- <x,1>)>
;
end
factorial 6
The result of parsing this should be:
(list (list (make-fundef 'factorial 'x
(make-if-s (make-funcall 'equal
(make-tuple (list 'x 0)))
1
(make-funcall '*
(make-tuple
(list 'x
(make-funcall
'factorial
(make-funcall
'-
(make-tuple (list 'x 1))))))))))
(make-funcall 'factorial 6))
Here's an example featuring local variable bindings:
procedure fordax <y,z> ; <<any,<int,any>> -- <any,any>>
begin
<int,int> z = <3,4>;
procedure inner z ; <<int,any> -- <any,any>> begin in + <z,3> end;
any q = 14;
in
z q
end
fordax <brug,zorb>
The result of parsing this should be
(list
(list
(make-fundef
'fordax
'(y z)
(make-where
(make-where (make-where (make-funcall 'z 'q) 'q 14)
'inner
(make-fundef 'inner 'z (make-funcall '+ (make-tuple (list 'z 3)))))
'z
(make-tuple (list 3 4)))))
(make-funcall 'fordax (make-tuple (list 'brug 'zorb))))
In order to link the parser with the evaluator, you will need to make two small changes. First, you will need to alter the set of primitives in the top-level-environment slightly. Second, you will need to add a new whole-program evaluator that accepts a list containing a list of procedures and a single expression, and returns a Value.
Only two changes are necessary. First, the name of of the equality-testing
primitive must be changed from equal? to equal, because our lexer
does not support the use of question marks in identifiers. Second, you should
add bindings for -, lt (less-than), and gt (greater-than).
The - operator should subtract when given a two-tuple of numbers, and
signal an error otherwise. The lt and gt primitives should produce
a boolean when given two numbers, and signal an error otherwise.
Since our language is intended to allow mutually recursive procedures at the top level, we need some way to construct a top-level environment that contains a closure for each of the procedures. Moreover, each of these closures must contain a reference to this same top-level environment. The trick we used before, of adding the recursive binding to the environment at the last minute (that is, when the procedure is called) is not convenient here. This means that we must construct a circular environment.
The easiest way to do this is to build a list of closures with bogus environments, use these to build the top-level environment, and then go and mutate each of the structures to refer to the environment we just constructed. Here's how I did it:
;; prog-eval : (list/c (listof fundef) Exp) -> Value
(define (prog-eval input)
(let* ([procs (car input)]
[body (cadr input)]
[will-be-closures (map (lambda (x) (make-closure x null)) procs)]
[new-env (fold env-extend top-level-env (map fundef-name procs) will-be-closures)])
(for-each (lambda (clo) (set-closure-env! clo new-env)) will-be-closures)
(p-eval body new-env)))
You don't need to do it in exactly this way -- you'll need to figure out how to
define or import fold -- but you will probably want to use the
set-closure-env! procedure, which changes the environment associated
with a closure. This procedure is another of the ones that is automatically
generated by the define-struct. So, for instance, if you write
(define-struct foo (bar baz))
... you can now use the procedures make-foo, foo-bar,
foo-baz, foo? ...and also the functions
set-foo-bar! and set-foo-baz!, which change the bar and
baz associated with a given foo. It's true, I didn't tell you
about these functions before. Mutating structures will get you into a lot of
trouble, and your life is simpler when you don't do it. Nevertheless, there
are times (like this) when it's the simplest way to get the job done.
Finally, for those of you with non-working assignment 4 evaluators, you may use the one that I've provided as a solution. To find it, follow the link from the Assignments page that allows you to check on the status of your submissions.
As always, you must provide test cases for all of your functions.
You must write the function malgol-file-evaluator, with the following
contract and header:
;; malgol-file-evaluator : string -> Value (define (malgol-file-evaluator filename) ...)
This function takes its argument (a filename, represented as a string), creates a lexer for it, parses it, then passes it to the evaluator. It's a very simple function. Don't write or test it until the other pieces are all working.
This assignment is due at 12:00 PM (noon) on Monday, February 27th. It will be worth 9 points, rather than 6. Get started soon.