CSC 468 Project. MiniNXBase Tokenizer Speicification. ----------------------------------- 1. Outline MiniNXBase tokenizer has to implement the following functionality: - input a string; - traverse the string and break it into individual tokens; - construct a list of tokens corresponding to the string; - output the list of tokens; - provide functionality to manipulate the list of tokens. As input, the tokenizer is to take the strings specifying MiniXBase commands. MiniXBase commands are described in the Project Description document. 2. Implementation issues. The tokenizer is to be implemented in Java. While its final architecture may be more extensive, the following classes must be implemented. * public class MXBtokenizer The main tokenizer class, it will contain the actual tokenizer/lexical analyzer code. * public class Token elements of class Token represent individual tokens constructed by the methods of class MXBtokenizer. * public class TokenList represents a list of Token elements. The output of the tokenizer method is an instance of this class. 3. Class details Here, we list a number of functions that MUST be implemented for each of the three classes. These functions are designed to become the interface between the MiniXBase code and the tokenizer and its output. The final implementation may, contain other functionality. public class Token private <...> Type; private String Value; public <...> GetType(); public String GetValue(); public int SetType(<...> T); public int SetValue(String V); Comments: - <...> means that the type of the attribute Type of class Token is to be determined by the developer. It can be an integer type, or it can be an enumeration of the types (the list and description of the types are given below). - the outputs of SetType() and SetValue() methods are "error codes". public class TokenList public Token Pop(); public Token Head(); public int InsertToken(Token T); public boolean IsEmpty(); Comments: - Pop(); removes the head token from the TokenList object; - Head(); returns the head token without removing it from the list. - all-in-all, TokenList, really, behaves like a stack. public class MXBtokenizer public TokenList Tokenize(String Command); Comments: Tokenize() takes as input a string that contains a MiniXBase command. It must break the string into individual tokens and construct their list in the output. For each token, its type (see below) and its value - i.e., the part of the string that forms the token, must be determined and stored in a Token instance. 4. Token types. The following token types must be supported. (1) Token Type: Separator Description: Tokens of type Separator serve to separate parts of XPLite expression. Possible Values: "::" "/" "[" "]" ")" (2) Token Type: Keyword Description: MiniXBase command keywords. Possible Values: "CREATE" "Create" "create" "INSERT" "Insert" "insert" "DROP" "Drop" "drop" "LIST" "List" "list" "STATS" "Stats" "stats" "FILE" "File" "file" "XML" "Xml" "xml" "CLEAR" "Clear" "clear" "DELETE" "Delete" "delete" Note: for simplicity, convert all values in the Token instances into ALLCAPS. That is, if the actual string contains "Create", Token.Value for it has to be "CREATE", etc... (3) Token Type: Axis Description: keywords for XPLite axes. Possible Values: "self" "child" "parent" "following-sibling" "preceding-sibling" "following" "preceding" "ancestor" "descendant" "attribute" Comments: Axis values are case-sensitive, that is, must be all lowercase. (4) Token Type: Operator Description: comparison operators Possible Values: "=" "<>" "<" ">" ">=" "<=" (5) Token Type: Function Description: an id that ends with "(". Used in the predicate and nodetest parts of the XPLite expressions. Possible Values: all nodetests and standard functions: "node(" "attribute(" "text(" "position(" "last(" "count(" "not(" "true(" "false(" (6) Token Type: Other Description: identifiers, values, and more. Possible values: any combination of ASCII characters between two tokens of types (1)-(5), and/or whitespace. Used for Repository Names, Element and Attribute names and function values. E.g., "MyRepository", "x", "root", ""Hello!"", "2"... Comment: tokens of type Other cannot match keywords and axis names. That is, repository names such as "Create" or "parent" are not allowed. Note that "Create" may be a value of a function: the token value would be ""Create"" (that is, one pair of double quotes is a part of the token value itself).