CSC 330 Assignment 2

Assignment 2: Lexical Analysis with JFlex



ISSUED: Monday, 4 April 2005
DUE: 5PM Wednesday 13 April, via handin on falcon/hornet
POINTS POSSIBLE: 100
WEIGHT: 9% of total class grade
READING: Lecture Notes Week 2, Textbook Section 2.1, JFlex manual and suplemental readings

Specification

The deliverable for this assignment is a stand-alone lexical analyzer for an extension of the Jay language defined in the 330 textbook. We'll call it "EJay". The lexical definition of the basic Jay language is given in Appendix B.1 of the book. The lexical extensions for EJay are the following:

Your EJay lexical analyzer must recognize all of the tokens in the Jay language, plus these EJay extensions. A link to the Java Language Specification is provided in the 330 website, in the doc directory. Also, in the online version of this writeup, the above references to sections 3.10 the language spec are linked to the cited sections.

What "stand-alone" means is that the lexer will recognize tokens of the EJay language, but not do anything more with them. A test driver program will call the lexer and simply print out all of the recognized tokens. The test driver for this assignment is supplied for you, i.e., you do not write it yourself. Details follow.

Helpful Examples

To help you get started, there is an example JFlex lexer for a subset of the Pascal programming language. It, and its supporting files, are in the directory www.csc.calpoly.edu/~gfisher/classes/330/examples/jflex. The example follows the format required for this assignment, and illustrates the implementation details described below.

There are also useful examples supplied in the JFlex download bundle, and in the other lex-related supplemental reading material pointed to in the 330 doc directory.

Implementation Details

The analyzer must be written using the JFlex lexical analyzer generator. To provide a standard framework for testing your analyzer, it must be built using the following JFlex template:

/*-***
 *
 * This file defines a lexical analyzer for the EJay language, which is an
 * extension of the Jay language defined in Section B.1 of the CSC 330
 * textbook.  The extensions that comprise EJay are defined in the writeup for
 * 330 Assignment 2.
 *
 */

import java_cup.runtime.*;

%%
/*-*
 * LEXICAL FUNCTIONS
 */

%cup
%line
%column
%class EJayLexer

%{

/**
 * Return a new Symbol with the given token id, and with the current line and
 * column numbers.
 */
Symbol newSym(int tokenId) {
    return new Symbol(tokenId, yyline, yycolumn);
}

/**
 * Return a new Symbol with the given token id, the current line and column
 * numbers, and the given token value.  The value is used for tokens such as
 * identifiers and numbers.
 */
Symbol newSym(int tokenId, Object value) {
    return new Symbol(tokenId, yyline, yycolumn, value);
}

%}


/*-*
 * PATTERN DEFINITIONS
 */

/******** PUT YOUR MACRO PATTERN DEFINITIONS HERE ********/


%%
/**
 * LEXICAL RULES
 */

/******** PUT YOUR RULE/ACTION DEFINITIONS HERE ********/

This template is followed in the example JFlex analyzer for Pascal.

Your analyzer must be defined in a file named "ejay.jflex". When this file is run through the JFLex generator, JFlex will produce the java file named "EJayLexer.java". This file is then compiled to produce the executable lexer program, i.e, "EJayLexer.class".

In addition to using the preceding template base, the analyzer must also follow the rule implementation convention illustrated in the Pascal example file, "pascal.jflex". Specifically, each rule that returns a token must build the a new Symbol object using the newSym method. The Symbol contains the numeric value of a token, plus additional lexical information. See the jflex-template and pascal.jflex for details.

As is discussed in the lecture notes and JFlex manual, a lexer needs to define numeric values for the tokens it recognizes. For this assignment, the definition must be done in a file named sym.java. The sample sym.java in the Pascal example illustrates the format. For assignment 2, this file can be built by hand, or using the CUP parser generator. We'll discuss details of this in class.

Special Note on Comments

Normally, the lexical analyzer in a compiler recognizes comments, but then just discards them as a form of whitespace. For the stand-alone lexer in this assignment, we need to confirm that comments are being properly recognized. So, in the rule actions for EJay comments (both the "//" and "/* ... */" forms), you must print out the message "Recognized comment: ", followed by the text of the comment that was recognized.

The pascal.jflex example illustrates how to do this. The example does it for Pascal-style comments; you need to do it for the extended EJay-style comments.

JFlex Details

The website for JFlex is jflex.de. The site has all of the relevant material for JFlex, including a manual and download. JFlex is written in and for Java. It requires a Java environment on whatever machine it is run on.

A command-line executable version of JFlex runs on falcon/hornet. It is located in ~gfisher/classes/bin/jflex. This file is a UNIX shell script that runs the compiled JFlex java program. If you download JFlex to a Windows machine, there is a jflex.bat batch file that provides command-line executable on Windows. There is also a JFlex.jar file in the lib directory of the download. JFlex.jar provides a simple GUI interface to the JFlex generator.

Turn-in and Evaluation Details

Turn your program in using the turnin program on falcon/hornet. Be sure to use the version of turnin in ~gfisher/classes/330/bin/turnin, since there may be other versions of the program elsewhere on falcon/hornet.

The turnin program is a command-line application. It prompts you for the name of the assignment being turned in, which is "a2". It will also prompt for the names of the files to turn. For this assignment, there are two files to be turned in, that must have these names:

  1. ejay.jflex -- the JFlex lexical analyzer
  2. sym.java -- the token definition file

The definition of compilation for this assignment is that ejay.jflex must run without error through JFlex, and EJayLexer.java must compile without error with javac.

A sample test file is in 330/assignments/2/sample-test-files/raw-ejay-tokens.txt. The actual test data will have additional test cases. You need to test your program on a sufficient number of test cases before turning it in. I.e., test it on all EJay keywords, operators, and various forms of literals.