CSC 307 Lecture Notes Week 9
Use of Formal Method Specification in Testing
Introduction to System Testing Techniques
Testing Implementation, in TestNG and JUnit



  1. Detailed summary of milestones 7-8 deliverables

    1. Design and Implementation

      1. Package design refinement.

      2. Javadoc.

      3. Java implementation of model classes.

      4. Necessary data structures.

      5. View classes refined to use real model data.

      6. Bottom line -- some stuff runs.

    2. Testing

      1. One integration test plan per team.

      2. One class test plans per team member.

      3. Three methods JML spec'd, per member (should already be done from Milestones 5-6).

      4. Three unit tests per member (for the methods that have JML).

      5. Code reviews.

    3. Administration

      1. Revised HOW-TO-RUN.html

      2. Separate m8-duties.html

      3. Updated work-breakdown.html

      4. Design reviews, wed & fri week 9.

  2. Deriving and refining method specifications.

    1. The validation of user inputs requires that we know exactly what constitutes valid versus invalid input values.

      1. The purpose of operation pre- and postconditions is to answer just this question.

      2. In addition to input validation, pre- and postconditions are used for formal system testing to inform the development of of unit tests.

    2. In the specifications we did in CSC 307, pre- and postconditions were associated with operations.

      1. When a Java method is derived from an operation, the pre- and postconditions are derived and refined for the method.

      2. Since the Java language does not support pre- and postconditions explicitly, they must appear in methods as comments rather than as part of the compilable code.

    3. Here is a recap from 307 of what pre- and postconditions mean:

      1. A precondition is a boolean expression that must be true before the operation or method executes.

      2. A postcondition is a boolean expression that must be true after the operation or method has completed execution.


  3. How formal specification is used in testing.

    1. As we'll discuss below, a formal function test consists of the following elements:

      1. Test inputs within legal ranges, and expected output results.

      2. Test inputs outside of legal ranges, and expected output results.

      3. Test inputs on the boundaries of legal ranges, and expected output results.

    2. The formal preconditions are used to determine what the inputs should be.

    3. The formal postconditions are used to determine the expected results for the given inputs.

  4. General concepts of system testing.

    1. Software components are independently testable.

    2. Testing is thorough and systematic.

    3. Testing is be repeatable, with the same results each time.


  5. Overall system testing styles

    1. Top-down

      1. Top-level functions in the function calling hierarchy are tested first.

      2. Function "stubs" are written for lower-level functions that are called by functions above.

    2. Bottom-up

      1. Lower-level functions in a function calling hierarchy are tested first.

      2. Function "drivers" are written for upper-level functions that call functions below.

    3. Object-oriented

      1. All functions for a particular class are tested, independent of how they may be used in overall system.

      2. Stubs and drivers are written as necessary.

    4. Hybrid

      1. A combination of top-down, bottom-up, and object-oriented testing is employed.

      2. This is a good practical approach.

    5. Big-bang

      1. All functions are compiled together in one huge executable (typically the night before it's due).

      2. We cross our fingers and run it.

      3. When the big-bang fizzles, we enter the debugger and keep hacking until things appear to work.


  6. Practical aspects of independently testable designs.

    1. For all modules to be separately designed and implemented, modular interfaces should be designed cleanly and thoroughly.

      1. Don't fudge on function signature details or pre/postcondition logic; i.e., think clearly about these details before the implementation begins.

      2. Be clear on what needs to be public and what protected.

    2. Be prepared to write stubs and drivers for other people's modules so that independent testing can be achieved.


  7. General approaches to testing and verification

    1. Black box testing

      1. Each function is viewed as a black box that can be given inputs to produce its outputs.

      2. The function is tested from the outside (specification) only, without looking at the code inside.

    2. White-box testing

      1. Each function is viewed as a "white" (i.e., transparent) box containing code.

      2. The function is tested by supplying inputs that fully exercise the logic of its code.

      3. Specifically, each logical control path through the function is exercised at least once by some test.

      4. This is the kind of testing that is done informally during the course of system debugging.

    3. Runtime pre-condition enforcement

      1. Code can be added to functions to enforce preconditions at runtime.

      2. For example, if a precondition states that a certain input must be within a certain range, then code is added to the beginning of the function to check this condition

      3. The function returns (or throws) an appropriate error if the condition is not met.

    4. Formal verification

      1. The pre- and postconditions of each function are treated as formal mathematical theorems.

      2. The body of the function is treated as a form of mathematical formula, given certain formal rules of program interpretation for the language in which the function is written.

      3. Verification entails proving that the precondition theorem implies the postcondition theorem, with respect to the mathematical interpretation of the function body.


  8. Functional unit test details

    1. For each method, a list of test cases is be produced.

    2. This list of cases constitutes the unit test plan for each method.

    3. A unit test plan is defined in the following general tabular form, as show in Table 1.



      Case No. Inputs Expected Output Remarks                                 
      1 parm 1 = ... ref parm 1 = ...  
      ... ...
      parm m = ... ref parm n = ...
        return = ...
      data field a = ... data field a = ...
      ... ...
      data field z = ... data field z = ...
      n parm 1 = ... ref parm 1 = ...  
      ... ...
      parm m = ... ref parm n = ...
        return = ...
      data field a = ... data field a = ...
      ... ...
      data field z = ... data field z = ...

      Table 1: Unit test plan.



    4. Note that

      1. The inputs for each case specify values for all input parameters as well as all referenced data fields for that case.

      2. The outputs for each case specify values for all reference parameters, return value, and modified data fields for that case.

      3. In any case, data fields that are not explicitly mentioned in the inputs or outputs are assumed to be "don't care" -- i.e, not used as an input or not modified on output.

    5. One such test plan is written for each method in each class.

    6. In an object-oriented testing strategy, unit test plans are referenced in then class test plans.


  9. Module, i.e., class testing

    1. Write unit test plans for each class method.

    2. Write a class test plan that invokes the unit test plans in a well- planned order.

    3. General guidelines for class testing are the following:

      1. Start the class test plan by invoking the unit tests for the constructors, so that subsequent tests have field data values to work with.

      2. Next, unit test other constructive methods (i.e., methods that add and/or change field data) so that subsequent tests have data to work with.

      3. Unit test selector methods (i.e., methods that access but do not change data) on the results produced by constructive methods.

      4. Test certain method interleavings that might be expected to cause problems, such as interleaves of adds and deletes.

      5. Stress test the class by constructing an object an order of magnitude larger than ever expected to be used in production.

    4. Once the plan is established, write a test driver for all methods of the class, where the driver:

      1. executes each method test plan,

      2. records the results,

      3. compares the results to the previous test run,

      4. reports the differences, if any

    5. A couple concrete examples of class test plans are in the Calendar Tool testing directory:

    6. In terms of Java details:

      1. Each class X in the system design has a companion testing class named XTest.

      2. A test class is a subclass of the class it tests.

      3. Each method X.f has a companion unit test method named XTest.testF.

      4. The comment at the top of each test class describes the test plan for that class.

      5. The comment for each unit test method describes the unit test plans for the testing method.

      6. Each tested class provides a specialization of java.lang.Object.toString, which is used to dump the values of tested class objects.


  10. Integration testing

    1. Once class test plans are executed, classes are integrated.

    2. Specifically, stub methods used in a unit or class test are replaced with the actual methods.

    3. Subsequently, the test plan for the top-most method(s) in a collection is rerun with the integrated collection classes.

    4. The integration continues in this manner until the entire system is integrated.

    5. A concrete example of an integration test plan is in the Calendar example testing directory:
      
      unix3:~gfisher/work/calendar/testing/implementation/source/java/caltool/integration-test-plan.html
      
      
      

  11. Black box testing heuristics

    1. Provide inputs where the precondition is true.

    2. Provide inputs where the precondition is false.

      1. These form of inputs do examples/not apply to by- contract methods that do not check their on precondition.

      2. These form of test inputs do apply to methods with a defensive implementation, where the method explicitly checks the precondition and throws an exception or otherwise returns an indication that the precondition is violated.

    3. For preconditions or postconditions that define data ranges:

      1. Provide inputs below, within, and above each precondition range.

      2. Provide inputs that produce outputs at the bottom, within, and at the top of each postcondition range.

    4. For preconditions and postconditions with and/or logic, provide test cases that fully exercise each clause of the logic.

      1. Provide an input value that makes each clause of the and/or logic both true and false.

      2. This means 2n test cases, where n is the number of logical terms.

    5. For classes that define some form of collection:

      1. Test all operations with an empty collection.

      2. Test all operations with a collection containing exactly one element and exactly two elements.

      3. Add a substantial number of elements, confirming the state of collection after each addition.

      4. Delete each element, confirming state of collection after each delete.

      5. Repeat addition/deletion sequence two more times.

      6. Stress test by adding and deleting from a collection of a size that is an order of magnitude greater than that ever expected to be used in production.


  12. Function paths

    1. A path is defined in terms of control flow through the logic of a method body.

    2. Each branching control construct defines a path separation point.

    3. By drawing the control-flow graph (i.e., flow chart) of a method, its paths are clearly exposed.

    4. To ensure full path coverage, each path is labeled with a number, so it can be referenced in white box tests.


  13. White box testing heuristics

    1. Provide inputs that exercise each method path at least once.

    2. For loops

      1. provide inputs that exercise the loop zero times (if appropriate),

      2. one time

      3. two times

      4. a substantial number of times

      5. the maximum number of times (if appropriate).

    3. Provide inputs that can reveal flaws in the implementation of a particular algorithm, such as:

      1. particular operation sequences

      2. inputs of a particular size or range

      3. inputs that may cause overflow, underflow, or other abnormal behavior

      4. inputs that test well-known problem areas in particular algorithm


  14. Testing Implementation -- the anatomy of a unit test method.

    1. Class and method under test:
      class X {
      
        // Method under test
        public Y m(A a, B b, C c) { ... }
      
        // Data field inputs
        I i;
        J j;
      
        // Data field output
        Z z;
      
      }
      

    2. Testing class and method:

      class XTest {
        public void testM() {
      
          // Set up
          X x = new X(...);
          ...
      
          // Invoke
          Y y = m(aVAlue, bValue, cValue);
      
          // Validate
          assertEqual(y, expectedY);
        } }
      


    3. The common core for a unit testing method is the same for all test implementation frameworks:

      1. Setup -- set up the inputs necessary to run a test

      2. Invoke -- invoke the method under test and acquire its actual output

      3. Validate -- validate that the actual output equals the expected output

    4. Summary of where test specification and planning fits in:

      1. The javadoc comment for the method under test contains the JML spec that specifies what must be true for inputs and outputs used in the tests.

      2. The javadoc comment for the testing class specifies the major phases of the testing, including the order in which the unit tests are executed.

      3. The javadoc comment for the testing method defines the unit test plan in tabular form; the plan has inputs, expected outputs, and remarks for each test case.


  15. A testing example using TestNG.

    1. TestNG is the recommended functional testing framework for 307.

      1. The "NG" in the name stands for "Next Generation", indicating it is a evolution of earlier testing frameworks, such as JUnit.

      2. Overall, it is very similar to JUnit, but has some features that improve on what is offered in JUnit.

      3. If your team has members who are familiar with JUnit, or some comparable testing framework, your team may use that instead of TestNG.

      4. To be used in 307, the requirements for a testing framework are these:

        1. It must support method/function-level unit testing.

        2. It must support class-level inter-method testing.

        3. It must support regression testing.

    2. There is a very good how-to document for TestNG at http://testng.org/doc/documentation-main.html

    3. Examples of using TestNG to implement class and unit tests are in the 307 Milestone 8 example for the Schedule Test

    4. We'll go over these examples in class, in particular the example for Calendar Tool scheduling:

    5. Important Note:: For 307 Milestone 8, you need to implement three unit tests per team member, but the do not need to execute; test execution will be required for the final project deliverable.

  16. The following excerpts are from a JUnit version of the preceding TestNG example, illustrating the fundamental similarities of the two testing frameworks.
    package caltool.schedule.junit;
    
    import junit.framework.*;
    
    import caltool.caldb.*;
    import mvp.*;
    import java.util.*;
    
    /****
     *
     * This is a JUnit version of the ScheduleTest class.  This class has the same
     * logical testing structure as the class in the parent directory, but this
     * class uses JUnit coding conventions.
     *
    ...
     *
     */
    public class ScheduleTest extends TestCase {   // Note extension of TestCase
    
    ...
    
        /**
         * Unit test getCategories by calling getCategories on a schedule with a
         * null and non-null categories field.  The Categories model is tested
         * fully in its own class test.
         *                                                                    <pre>
         *  Test
         *  Case    Input                   Output          Remarks
         * ====================================================================
         *   1      schedule.categories     null            Null case
         *            = null
         *
         *   2      schedule.categories     same non-null   Non-null case
         *            = non-null value      value
         *                                                                   </pre>
         */
        protected void testGetCategories() {
            Categories result;                              // method return value
    
            /*
             * Do case 1 and validate the result.
             */
            schedule.categories = null;                     // setup
    
            result = schedule.getCategories();              // invoke
    
            assertTrue(validateGetCategoriesPostcond(result))     // validate
    
    
            /*
             * Do case 2 and validate the result.
             */
    
            schedule.categories = new Categories();         // setup
    
            result = schedule.getCategories();              // invoke
    
            assertTrue(validateGetCategoriesPostcond(result))     // validate
    
        }
    
         ...
    
    }
    


  17. Here is the JUnit3 version of the test driver.
    package  caltool.schedule.junit3;
    
    import junit.framework.*;
    import junit.runner.BaseTestRunner;
    
    import caltool.caldb.*;
    
    /****
     *
     * Test driver for ScheduleTest.  This driver class contains only a simple main
     * method that constructs a JUnit test suite containing ScheduleTest.  A JUnit
     * runner, either command-line or in an IDE, takes it from there.
     *
     */
    
    public class ScheduleTestDriver {
    
        /**
         * Construct Junit test suite containing ScheduleTest and call the runner.
         */
        public static void main(String[] args) {
            junit.textui.TestRunner.run(suite());
        }
    
        /*
         * Construct the test suite containing XTest.
         */
        public static Test suite() {
            TestSuite suite= new TestSuite("XTest");
            suite.addTestSuite(XTest.class);
            return suite;
        }
    }
    

  18. Reconciling path coverage with purely black box tests.

    1. In CSC 307, we will use a purely black box testing style.

    2. To ensure that all paths are covered, black box tests can be executed under the control of a path coverage analyzer (though we will not use such an analyzer in 307).

    3. If the analyzer reports one or more paths not being covered, the coverage results are analyzed to see if new black box tests cases need to be added.

      1. When uncovered paths contain useless or dead code, the code can be removed and no further test cases are required.

      2. When uncovered paths are legitimate code, new test cases are added to the black box tests to ensure full path coverage.

    4. A complete "grey box" test plan can have an additional column that indicates the path each black box test case covers, as in:

      Test No. Inputs Expected Output Remarks Path
      i parm 1= ref parm 1 = p
      ... ...
      parm m = ref parm n =
      where p is the number of the method path covered by the test case i.


  19. Specifying large inputs and outputs in functional tests

    1. For collection classes, inputs and outputs can grow large.

    2. For convenience, such inputs and outputs can be specified as file data, instead of the result of calling a series of constructor methods in the context of a class test.

    3. When external test data files are used, they can be referred to in test plans and used during test execution.


  20. Test drivers for test execution

    1. Once a test suite is defined, it must be executed.

    2. To automate the testing process, and ensure that it is repeatable, a test driver is written as a stand-alone program.

      1. The test driver executes all tests defined in the system test plan.

      2. It records all results in an orderly manner, suitable for human inspection.

      3. The test driver also provides a test result differencer that compares the results of successive test runs and summarizes differences.

    3. For 307, this process is automated in a Makefile, as exemplified in
      unix3:~gfisher/work/calendar/testing/implementation/source/java/Makefile
      

    4. To perform tests initially, before all tests are executed via the Makefile, a symbolic debugger such as jdb can be used to execute individual methods.


  21. Testing concrete UIs

    1. With a UI toolkit such as Swing, concrete UI tests are performed in the same basic manner as other functional tests.

    2. User input, such as button pressing, is simulated by calling the interface method that is associated with the particular form of input, e.g., SomeButtonListener.actionPerformed.

    3. Outputs that represent screen contents are validated initially by human inspection of the screen.

    4. Ultimately, some machine-readable form of the screen output must be used to compare test results mechanically.

    5. Note that we will NOT do this level of testing in 307, but rather test the GUIs via human interaction.


  22. Unit testing is a "dress rehearsal" for integration testing.

    1. One might think if we do a really thorough job of method and class tests, integration should not reveal any further errors.

    2. We know from experience that integration often does reveal additional flaws.

      1. In this sense, failures of integration testing can be viewed as unit test failures.

      2. That is, a flaw revealed by an integration test indicates an incompleteness of the test cases for some individual method.

      3. The flaw is remedied by updating of the appropriate method test plan.

    3. In so doing, individual tests become stronger.


  23. Testing models with large process data requirements.

    1. Suppose we have the following

      class SomeModestModel {
      
          public void doSomeModelThing(String name) {
              ...
              hdb.doSomeProcessThing(...);
              ...
          }
      
          protected HumongousDatabase hdb;
      
      }
      
      class HumongousDatabase {
      
          public void doSomeProcessThing(...) {
              ...
          }
      
      }
      

    2. In such cases, it may be quite time consuming to implement a stub for the method HumongousDatabase.doSomeProcessThing.

    3. This is a place where bottom-up testing is appropriate.


  24. On really bright coders who don't need to do systematic testing.

    1. There are a few of these floating around at various institutions.

    2. They do informally what mere mortals need to do in a more systematic way.

    3. Ultimately, even the brightest hack will not be able to do all testing informally.

    4. As programs are built in larger teams, no single person can know enough about the entire system to test it alone.

    5. Therefore, team-constructed software must be team tested, in a systematic manner.


  25. Other testing terminology

    1. The testing oracle

      1. A test oracle is someone(thing) who(that) knows the correct answer to a test case.

      2. The oracle is used in test plan generation to define expected results.

      3. The oracle is also used to analyze incorrect test results.

      4. For the style of development we have used in CSC 307, the oracle is defined by human interpretation of the requirements specification.

        1. When using a formal specification such as JML, the oracle for a method is defined precisely as the method's postcondition.

      5. When building a truly experimental piece of code for which the result is not yet known, specification-based oracle definition may not always be possible.

        1. These are cases such as artificial intelligence systems where the code is designed to tell us something we don't already know the answer to.

        2. To test such systems requires some initial prototype development, inspection of the results, and then definition of the tests.

    2. Regression testing

      1. This is the name given to the style of testing that runs all tests in a suite whenever any change is made to any part of the system.

      2. Typically full regression tests are run at release points for the system.

      3. There is ongoing research aimed at "smart" regression testing, where not all tests need to be run if it can be proved that a given change cannot possibly affect certain areas of the system.

    3. Mutation testing

      1. This is a means to test the tests.

      2. The strategy is to mutate a program and then rerun its tests.

      3. For example, suppose an if statement coded as "if (x < y)" is mutated to "if (x >= y)".

      4. When such a mutation is made and a previously successful set of tests are run, the tests should fail in the places where the mutated code produces an incorrect result.

      5. If a set of previously successful tests do not fail on a mutated program, then one of two possibilities exists:

        1. The tests are too weak to detect a failure that should have been tested, in which case the tests need to be strengthened.

        2. The mutated section of code was "dead" in that it did not compute a meaningful result, in which case the code should be removed.

      6. Generally, the first of these to possibilities is the case.

      7. Mutation testing can be used systematically in such a way that mutations are made in some non-random fashion.

        1. Such systematic mutation provides a measure of testing effectiveness.

        2. This measure can be used to test the effectiveness of different testing strategies.


  26. Testing directory structure

    1. Figure 1 shows the details of the testing directory structure in the context of a normal project directory (without package subdirectories).


      Figure 1: Testing directory structure.



    2. The contents of the testing subdirectories are shown in Table 2.



      Directory or File Description
      *Test.java Implementation of class testing plans. Per the project testing methodology, each testing class is a subclass of the design/implementation class that it tests.


      input Test data input files used by test classes. These files contain large input data values, as necessary. This subdirectory is empty in cases where testing is performed entirely programatically, i.e., the testing classes construct all test input data dynamically within the test methods, rather than inputing from test data files.


      output-good Output results from the last good run of the tests. These are results that have been confirmed to be correct. Note that these good results are platform independent. I.e., the correct results should be the same across all platforms.


      output-prev-good Previous good results, in case current results were erroneously confirmed to be good. This directory is superfluous if version control of test results is properly employed. However, this directory remains as a backup to avoid nasty data loss in case version control has not been kept up to date.


      $PLATFORM/output Current platform-specific output results. These are the results produced by issuing a make command in a platform-specific directory. Note that current results are maintained separately in each platform-specific subdirectory. This allows for the case that current testing results differ across platforms.


      $PLATFORM/diffs Differences between current and good results.


      $PLATFORM/Makefile Makefile to compile tests, execute tests, and difference current results with good results.


      $PLATFORM/.make* Shell scripts called from the Makefile to perform specific testing tasks.


      $PLATFORM/.../*.class Test implementation object files.

      Table 2: Test file and directory descriptions.



    3. In the table, the variable $PLATFORM refers to the one or more subdirectories that contain platform-specific testing files (e.g., JVM, INTEL).




index | lectures | handouts | examples | textbook | doc | grades