Generating Unit Tests from Specs

Generating Unit Tests from Specs --
The Next Next Generation.

1. Introduction and Motivation

The 509 lecture notes from Monday April 21st listed the following points as Very common refrains about manual test case generation, as performed by humans:

It's tedious
It's boring
It's error prone
It may leave important things untested
There's got to be a better way

Using formal method specifications has often been cited as one such "better way". One particular approach among many uses the JML specification language. The most recent tool to support semi-automated test generation for JML is JMLUnitNG. It is a prototype, with the following specific shortcomings:

a combinatorially explosive number of test cases
incomplete executability of postconditions
the generated test programs are overly complicated

Another inherent problem with JML and JMLUnitNG is that they only work for programs written in Java. For many environments, including software engineering courses taught at Cal Poly, this is a significant drawback.

An alternative approach to JML is to define a simple, language-independent notation for model-based specification. This simple notation, suitably syntactically sugared, can be embedded in comments above methods in a number of programming languages, Java included.

Instead of using Java-specific tools to parse the specs and generate testing code, the following implementation approach is taken:

Extract the specs lexically from the program source code and parse the specs with a dedicated parser. This parser will be much simpler than the compiler for a full language, since it only needs to parse quantified boolean expressions.
Generate clean source code for the TestNG tests, that can be edited, compiled, and executed as a stand-alone test suite. This is somewhat different from the JMLUnitNG approach, where the generated tests have additional source code support that would ideally not be necessary for cleanly stand-alone tests.

The sections that follow have further details of the mixed-language notation and the tools that support it.

Projects for CSC 509 will be working on a tool to support this new approach.

2. Two Firm and Overarching Goals

Keep it as simple as possible
Provide multi-language support

The practical and concrete purpose of these goals is to provide a tool that can be used by students in CSC 307 and 309 in the Cal Poly computer science department.

2.1. About Goal 1

Look at JML and other spec languages and do these things:

Use the most widely-used terminology for keywords.
If a feature absolutely does not need to be used to generate (basic) tests, then don't have it.

A particular note about the second point is that we will eliminate specification language features that are necessary for formal verification but not (necessarily) needed for black-box test generation.

2.2. About Goal 2

Provide support for the following languages:

Java
Python
C and closely-related dialects, e.g., C++ and C#
Ruby (maybe)

The idea is that we want the same generic spec language syntax to appear as comments across these languages, with only superficial lexical differences. The syntax should be as natural as possible, with no superficial adornments.

To promote ease of use, the lexical analyzer and parser can have a modicum of intelligence to understand language differences. E.g., Java and C use "&&" for logical 'and' whereas Python and Ruby use the keyword "and". This kind of difference can be handled lexically in a number of ways, including a requirement (at least for version 1 of the tool) that the extension of a program file be an accurate indication of the programming language that the file contains.

3. What to Throw Out from JML, and Why

The short immediate answer to "Why" is to keep things simple. More specific answers below address why it's OK to eliminate some JML feature by discussing more specifically what that feature does and why it's not necessary to the goal of using specs for light-weight test generation.

4. Syntax Details

4.1. Keywords for ``precondition'' and ``postcondition''

Keywords	Pros	Cons
pre, post	short and sweet, most typically used in literature and other discussions, language independent	possibly more difficult to recognized lexically but requiring colon token probably takes care of this
precondition, postcondition	very clear, possibly easier to recognize lexically	verbose
require, ensures	used a good deal in extant implementations, consistent with JML	seemingly not the most widely used in the literature and therefore gratuitously lacking in mnemonic value, longer than "pre:" and "post:"
@pre, @post	consistent with JML	annoying otherwise

4.2. Notation for Starting and Ending Values of an Input/Output Variable

A widely used notation in the literature is the "prime" notation, represented as a postfix single quote operator. For an input/output variable x, the variable name by itself refers to the input value, with the variable named suffixed with an apostrophe refers to the output value. I.e., x by itself refers to the input, x' refers to the output value.

Using this notation, a postcondition that specifies incrementing the input/output variable x looks like this:

post: x' = x + 1;

4.3. Notation for Denoting Method/Function Return Values

The keyword "return" is used in Java and other languages as a statement, making it syntactically unusable as a term in a Boolean expression. That's why in JML, for example, the keyword "\result" is used instead.

Since the notation we're devising here need not run through the compiler for any supported language, we're free to use return as we wish. Hence, it can in fact be used as-is in a postcondition. It's type is whatever type is returned by the method/function being specified.

4.4. Notation for Conditional Expressions

The spec language notation being considered here needs some form of conditional expression, as distinguished from the conditional statements that are used in programming languages. In Java and C, this is done with "... ? .... : ..." operators. It's done with this separate syntax to separate it clearly for the "if ... else ..." statement syntax.

Since again we're not going to use the native compiler for any target language, we do not need to worry about lexical or syntactic conflicts with the if and else keywords as used in statements. Furthermore, the particular syntax chosen by C and Java for conditional expressions is not universal, and pretty obtuse to look at for those unfamiliar with it (or even for those who are familiar to it).

Taking these facts into consideration, we'll used keywords "if", "then", and "else" for conditional expressions. It's clear that the keyword "then" has fallen out of favor with the advent of C- flavored programming language syntax. However, there is a long programming language history of "then" as a keyword, and it's not a complete anachronism given for example that Ruby allows it as an option. Overall, for clarity of purpose in our context, we believe that "if-then-else" is a viable programming-language-neutral syntax.

4.5. Other Logic Operator Conveniences

At the moment, it's an open question if we need or want to include other logic operators in the proposed notation. These include in particular logical implication and equivalence, denoted in JML as "==>" and "<==>" respectively. These are pretty decent to look at and may be reasonable to include. We'll make a decision soon.

5. Dealing with the Q Word

There's no question that the biggest notational addition, as well as conceptual hurdle, is quantification. The simple words "forall" and "exists" have been around since the 60s with Boyer-Moore logic. Here's a quick example that's like JML, but a bit more text-booky with the use of '|' for such that.

forall (int i | i >= 0 && i < n) l.get(i) >= 0

This reads "for all integers i, such that i is between 0 and n inclusive, the ith element of list l is greater than or equal to 0."

Syntactically, forall and exists are the obvious choices. The much more difficult implementation issue with quantifiers is executability. In the testing tool we're considering here, we must provide some form of quantifier execution, since we want to use postconditions directly as test oracles, and postconditions may often contain quantification.

For any specification language that provides quantifier execution, there must be some restriction on unbounded quantification. The vast majority of executable spec languages simply disallow it. An interesting approach to quantifier execution is presented in the Cal Poly MS thesis by Paul Corwin. The approach presented in this thesis is very much applicable to the form of test tool we're considering here. Further discussion is most definitely coming here ... .

6. Specific Things that Corrigan Can Do to Get Started

Work some more on the syntax of the notation and write a CFG for it. You can use the parser generator tool from 430, or something else.
As the notation firms up, define the syntax tree data structure that the notation parses into.
Hand-write some unit test cases for the Stack.java example, using the simple black-box test generation rules from CSC 309, which are the same rules outlined in the two research papers by Richardson and Weyuker.
Look at the lecture notes from a previous 509 class that discuss some algorithm details for generating test cases from specifications; these notes refer in particular to two research papers:
1. "Structural Specification-based Testing with ADL" Juei Chang, Debra Richardson, and Sriram Sankar, ISSTA 1996.
2. "Automatically Generating Test Data from a Boolean Specification" by Elaine Weyuker, IEEE Transaction on Software Engineering, Vol 20, No. 5, May 1994.

I think the main focus of your work in 509 should be on parsing a spec written in the language-independent notation and generating some basic tests for it. A couple things that are most likely beyond the scope of a 509 project are

dealing with mixed-language details, in particular extracting specification comments from program files written in different languages
runtime support for unbounded quantifier execution, though some syntactic support for it is probably doable

7. CSC 509 Discussion of these Topics, 28 April 2014

The ideas presented above were discussed in CSC 509 on Monday 28 April, using a highly condensed set of slides. During the discussion of the slide material, a number of the students who have experience with JML commented on the ideas presented in the slides. There was general agreement on the slide points, however most felt that the payoff of automatic test generation would have to be substantial before they would consider the unmandated use of JML to be worth their while. Here "substantial" means at least the following:

the tool would have to really work, not just be some half-baked academic proof- of-concept project
the tool would have to generate test cases that would not have readily been generated by hand, using human-powered black-box or white-box testing techniques

A particular comment of note was from Austin Wylie. He commented that "white box" definition of specs, i.e., after the code was written, seemed in some cases to be redundant, in particular for methods with simple logic. For example, the postcondition for a simple set method is simply a different, and unfamiliar, notation for what's in the code body. This observation might lead one to consider including simple code generation "recommendations" for such simple method bodies.

It's not clear if this could work for anything but trivially simple methods. However, it's worth giving some thought to the idea of a inducing programmers to use formal specs for testing purposes, and using that inducement as a back door to doing some simple spec-based code generation that may interest the programmers.