CSC 509 Lecture Notes Week 8

CSC 509 Lecture Notes Week 8
Introduction to System Testing



  1. General concepts of system testing
    1. Modules should be independently testable.
    2. Testing should be thorough and systematic.
    3. Testing should be repeatable.

  2. Overall system testing styles
    1. Top-down
      1. Top-level functions in the function calling hierarchy are tested first.
      2. Function stubs are written for lower-level functions that are called by functions above.
    2. Bottom-up
      1. Lower-level functions in a function calling hierarchy are tested first.
      2. Function drivers are written for upper-level functions that call functions below.
    3. Object-oriented
      1. All functions for a particular class are tested, independent of how they may be used in overall system.
      2. Stubs and drivers are written as necessary.
    4. Hybrid
      1. A combination of top-down, bottom-up, and object-oriented testing is employed.
      2. This is a good practical approach.
    5. Big-bang
      1. All functions are compiled together in one huge executable (typically the night before it's due).
      2. We cross our fingers and run it.
      3. When the big-bang fizzles, we enter the debugger and keep hacking until things appear to work.

  3. Practical aspects of independently testable designs
    1. For all modules to be separately designed and implemented, modular interfaces should be designed cleanly and thoroughly.
      1. Don't fudge on function signature details or pre/postcondition logic; i.e., think clearly about these details before the implementation begins.
      2. Be clear on what needs to be public and what protected.
    2. Be prepared to write stubs and drivers for other people's modules so that independent testing can be achieved.

  4. The relationship between user-level formal specification and system-level formal specification
    1. The same mathematical notation is used for both -- namely preconditions and postconditions are associated with each functional unit.
      1. At the user-level, the pre- and postconditions are specified for each operation in the specification.
      2. At the system-level, the pre- and postconditions are specified for each function in the design.
    2. Given the traceability between specification-level operations and design-level functions, we observe that:
      1. Directly traceable functions in the model part of the design start with the same pre- and postconditions as the operations to which they trace.
      2. These pre- and post-conditions are strengthened by adding additional clauses that address implementation-level refinements.
      3. Non-traceable process must have pre- and postconditions specified by the designers.

  5. What does formal specification buy us?
    1. A precise definition of what a function does that helps clarify our thinking before coding.
    2. The basis for intelligent test case generation.
    3. The basis for program inspections, and the "cleanroom" approach to system testing.
    4. The basis for formal program verification.
    5. Formal specification has been shown to be cost effective given the benefits it provides for testing and verification.

  6. How formal specification is used in testing
    1. As we'll discuss further in the next lecture, a formal function test consists of the following elements:
      1. Test inputs within legal ranges, and expected output results.
      2. Test inputs outside of legal ranges, and expected output results.
      3. Test inputs on the boundaries of legal ranges, and expected output results.
    2. The formal preconditions are used to determine what the inputs should be.
    3. The formal postconditions are used to determine the expected results for the given inputs.

  7. How formal specification is used in formal verification
    1. In order to verify a program formally, two forms of specification must be provided:
      1. A formal specification of the given program
      2. A formal specification of the language in which the program is written in
    2. Hence, a formal program specification is an integral part of formal verification.
    3. We will discuss formal verification details in an upcoming lecture.

  8. General approaches to testing and verification
    1. Black box testing
      1. Each function is viewed as a black box that can be given inputs to produce its outputs.
      2. The function is tested from the outside (specification) only, without looking at the code inside.
    2. White-box testing
      1. Each function is viewed as a "white" (i.e., transparent) box containing code.
      2. The function is tested by supplying inputs that fully exercise the logic of its code.
      3. Specifically, each logical control path through the function is exercised at least once by some test.
      4. This is the kind of testing that is done informally during the course of system debugging.
    3. Formal verification
      1. The pre- and postconditions of each function are treated as formal mathematical theorems.
      2. The body of the function is treated as a form of mathematical formula, given certain formal rules of program interpretation for the language in which the function is written.
      3. Verification entails proving that the precondition theorem implies the postcondition theorem, with respect to the mathematical interpretation of the function body.

  9. Precondition enforcement revisited, in the context of testing.
    1. At the specification level, failure of an operation precondition renders the operation simply "undefined"
      1. For an abstract specification, this is a perfectly reasonable definition of precondition failure.
      2. However, at the design and implementation level, precondition failure must be dealt with more concretely.
      3. There are two basic approaches such concretization.
    2. Approach 1: A precondition is a guaranteed assumption that will always be true before a function to which it is attached is executed.
      1. This approach is can be called the "programming by contract" approach.
      2. In this approach, the code of the function does not enforce its own precondition, but rather the precondition must be enforced by all callers of the function.
      3. Such enforcement can be formally verified or implemented with runtime checks.
    3. Approach 2: A precondition must be checked by the function to which it is attached.
      1. This approach can be called the "defensive programming" approach.
      2. In this approach, the code of the function includes logic to enforce its precondition.
      3. The enforcement can:
        1. Assert unconditional failure on any precondition violation.
        2. Return an appropriate "nil" value as the function return value or in an output parameter.
        3. Output an appropriate error report to stderr or the user view screen.
        4. Throw an appropriate exception (see below for further discussion).
      4. In all but the first of these enforcement styles, the system-level postcondition must be enhanced to specify both normal and exceptional output behavior.
        1. For example, suppose a function is specified initially as follows
          void SomeDB::Add(Item* i)
              /*
               * pre: not Find(i);
               * post: (i in this'->data) and ... ;
               *
               */
          
        2. If the exception handling approach to precondition enforcement is chosen, the refined formal specification for this function is as follows
          void  SomeDB::Add(Item* i)
              /*
               * pre: not Find(i);
               * post: if (not Find(i) then
               *           (i in this'->data) and ... ;
               *       else
               *           (this == this') and
               *           (throw == AddException)
               *
               */
          
          where AddException is a defined exception value.

  10. Functional unit test details
    1. For each function, a list of test cases is be produced
    2. This list of cases constitutes the unit test plan for each function.
    3. A unit test plan is defined in the following general tabular form:

      Case No. Inputs Expected Output Remarks                                 
      1 parm 1 = ... ref parm 1 = ...
      ... ...
      parm m = ... ref parm n = ...
      return = ...
      data member a = ... data member a = ...
      ... ...
      data member z = ... data member z = ...
      n parm 1 = ... ref parm 1 = ...
      ... ...
      parm m = ... ref parm n = ...
      return = ...
      data member a = ... data member a = ...
      ... ...
      data member z = ... data member z = ...
    4. Note that
      1. The inputs for each case specify values for all input parameters as well as all referenced data members for that case.
      2. The outputs for each case specify values for all reference parameters, return value, and modified data members for that case.
      3. In any case, data members that are not explicitly mentioned in the inputs or outputs are assumed to be "don't care" -- i.e, not used as an input or not modified on output.
    5. One such test plan is written for each function in each class.
    6. In an object-oriented testing strategy, the unit test plans are included in the module test plan for a complete class.

  11. Module (i.e., class) testing
    1. For a given class, write unit test plans for each member function.
    2. For the class as a whole, write a module test plan that invokes the unit test plans in a well-planned order.
    3. General guidelines for module testing include the following:
      1. Start the module test plan by invoking the unit tests for the constructors, so that subsequent tests have member data values to work with.
      2. Next, unit test other constructive functions (i.e., functions that add and/or change member data) so that subsequent tests have data to work with.
      3. Unit test selector functions (i.e., functions that access but do not change data) on the results produced by constructive functions.
      4. Test certain function interleavings that might be expected to cause problems, such as interleaves of adds and deletes.
      5. Stress test the class by constructing an object several times larger than ever expected to be used in production.
    4. Once the plan is established, write a test driver for all functions of the class, where the driver:
      1. executes each function test plan,
      2. records the results,
      3. compares the results to the previous test run,
      4. reports the differences, if any
    5. An concrete example of of a module test plan is in the Rolodex example testing directory:
      projects/work/rolodex/testing/design/RolodexTest.html
      
    6. In terms of Java++ details
      1. Each class X in the system design has a companion testing class named XTest.
      2. A test class is a subclass of the class it tests.
      3. Each member function X.f has a companion unit test function named XTest.testF.
      4. The comment at the top of each test class describes the module test plan for that class.
      5. The comment for each unit test member function describes the unit test plans for the testing function.

  12. Integration testing
    1. Once module test plans are executed, modules are integrated.
    2. Specifically, stub functions used in a unit or module test are replaced with the actual functions.
    3. Subsequently, the test plan for the top-most function(s) in a collection is rerun with the integrated collection modules.
    4. The integration continues in this manner until the entire system is integrated.

  13. Black box testing heuristics
    1. Provide inputs where the precondition is true.
    2. Provide inputs where the precondition is false.
    3. For preconditions or postconditions that define data ranges:
      1. Provide inputs below, within, and above each precondition range.
      2. Provide inputs that produce outputs below, within, and above each precondition range.
    4. For preconditions and postconditions with and/or logic, provide test cases that exercise each clause of the logic.
    5. For classes that define some form of collection:
      1. Test all operations with an empty collection.
      2. Test all operations with a collection containing exactly one element and exactly two elements.
      3. Add a substantial number of elements, confirming state of collection after each addition.
      4. Delete each element, confirming state of collection after each delete.
      5. Repeat addition/deletion sequence two more times.

  14. Function paths
    1. A path is defined in terms of control flow through the logic of a function body.
    2. Each branching control construct defines a path separation point.
    3. By drawing the control-flow graph (i.e., flow chart) of a function, its paths are clearly exposed.
    4. To ensure full path coverage, each path is labeled with a number, so it can be referenced in white box tests.

  15. White box testing heuristics
    1. Provide inputs that exercise each function path at least once.
    2. For loops
      1. provide inputs that exercise the loop zero times (if appropriate),
      2. one time
      3. two times
      4. a substantial number of times
      5. the maximum number of times (if appropriate).
    3. Provide inputs that can reveal flaws in the implementation of a particular algorithm, such as:
      1. particular operation sequences
      2. inputs of a particular size or range
      3. inputs that may cause overflow, underflow, or other abnormal behavior
      4. inputs that test well-known problem areas in particular algorithm

  16. Reconciling path coverage with purely black box tests.
    1. In the methodology outlined here, we write purely black box tests.
    2. The tests are executed under the control of a path coverage analyzer.
    3. When (if) the analyzer reports one or more paths not being covered, we use "grey box" by analyzing the uncovered paths, and strengthening the black box tests to cover the path as necessary.
    4. Note is some cases, uncovered paths may be useless or dead code that can be removed, thus requiring no further tests.
    5. A complete grey box test plan can have an additional column that indicates the path each test case covers, as in:

      Test No. Inputs Expected Output Remarks Path
      i parm 1= ref parm 1 = p
      ... ...
      parm m = ref parm n =
      where p is the number of the function path covered by the test.

  17. Specifying large inputs and outputs in functional tests
    1. For collection classes, inputs and outputs can grow large.
    2. For convenience, such inputs and outputs can be specified as file data, instead of the result of calling a series of constructor functions in the context of a module test.
    3. When external test data files are used, they can be referred to in test plans and used during test execution.

  18. Test drivers for test execution
    1. Once a test suite is defined, it must be executed.
    2. To automate the testing process, and ensure that it is repeatable, a test driver must be written as a stand-alone program.
      1. The test driver must execute all tests defined in the system test plan.
      2. It must record all results in an orderly manner, suitable for human inspection.
      3. The test driver must also provide a test result differencer that compares the results of successive test runs and summarizes differences.
    3. For 509, this process is automated in a Makefile, as exemplified in
      projects/alpha/rolodex/testing/implementation/Makefile
      
    4. To perform tests initially, before all tests are executed via the Makefile, a symbolic debugger such as jdb can be used to execute individual functions.

  19. Testing concrete UIs
    1. With a UI toolkit such as Swing, concrete UI tests are performed in the same basic manner as other functional tests.
    2. User input, such as button pressing, is simulated by calling the interface function that is associated with the particular form of input, e.g., SomeButtonListener.actionPerformed.
    3. Outputs that represent screen contents are validated initially by human inspection of the screen.
    4. Ultimately, some machine-readable form of the screen output must be used to compare test results mechanically.
    5. Note that we will NOT do this level of testing in 509, but rather test the GUIs via human interaction.

  20. Unit testing is a "dress rehearsal" for integration testing.
    1. One might think if we do a really thorough job of function and class tests, integration should not reveal any further errors.
    2. We know from experience that integration often does reveal additional flaws.
      1. In this sense, failures of integration testing can be viewed as unit test failures.
      2. That is, a flaw revealed by an integration test indicates an incompleteness of the test cases for some individual function.
      3. This should lead to an update of the appropriate function test plan
    3. In so doing, individual tests will become stronger.

  21. On testing models with large-data process requirements.
    1. Suppose we have the following

      class SomeModestModel {
      
          public void doSomeModelThing(String name) {
              ...
              hdb.doSomeProcessThing(...);
              ...
          }
      
          protected HumongousDatabase hdb;
      
      }
      
      class HumongousDatabase {
      
          public void doSomeProcessThing(...) {
              ...
          }
      
      }
      
    2. In such cases, it may be quite time consuming to implement a stub for the function HumongousDatabase.doSomeProcessThing.
    3. This is a place where bottom-up testing is appropriate.

  22. On really bright coders who don't need to do systematic testing.
    1. There are some these floating around at various institutions.
    2. They do informally what mere mortals need to do in a more systematic way.
    3. Ultimately, even the brightest hack will not be able to do all testing informally.
    4. As programs are built in larger teams, no single person can know enough about the entire system to test it alone.
    5. Therefore, team-constructed software must be team tested, in a systematic manner.

  23. Other testing terminology
    1. Oracles
      1. What are they? Someone(thing) who (that) knows the answer.
      2. They are used primarily in test plan generation to define expected results.
      3. They are also used to analyze incorrect test results.
      4. When an oracle takes on mythic proportions:
        1. When building a truly experimental piece of code for which the result is not yet known.
        2. I.e., the code is designed to tell us something we don't already know the answer to.
        3. Such cases are actually pretty rare.
        4. There's also a bit of mythicness to looking up the answer in a book (e.g., a trig function) or using someone else's (mathematical) theory we don't really understand ourselves.
    2. Regression testing
      1. This the name given to the style of testing that we will employ that runs all tests in a suite whenever any change is made to any part of the system
      2. Typically full regression tests are run at release points for the system.
      3. There is ongoing research aimed at "smart" regression testing, where not all tests need to be run if it can be proved that a given change cannot possibly certain areas of the system.
    3. Mutation testing
      1. This is a means to test tests.
      2. It employs a strategy where a program is mutated by changing the sense of some piece of logic
      3. For example, suppose the test in an if statement coded as "if (x < y)" is changed to "if (x >= y)".
      4. When such a mutation is made and a previously successful set of tests are run, the tests should fail in the places where the mutated code produces an incorrect result.
      5. If a set of previously successful tests do not fail on a mutated program, then one of two possibilities exists:
        1. The tests are too weak to detect a failure that should have been tested, in which case the tests need to be strengthened.
        2. The mutated section of code was "dead" in that it did not compute a meaningful result, in which case the code should be removed.

      6. Generally, the first of these to possibilities is the case.
      7. Mutation testing can be used systematically in such a way that mutations are made in some non-random fashion.
        1. Such systematic mutation provides a measure of testing effectiveness.
        2. This measure can be used to test the effectiveness of different testing strategies, about which we will say more next week.

  24. A concrete details of test planning
    1. As outlined above, the following is the general strategy for test plan design and implementation in Java:
      1. Each class X in the system design has a companion testing class named XTest.
      2. A test class is a subclass of the class it tests.
      3. Each member function X.f has a companion unit test function named XTest.testF.
      4. The comment at the top of each test class describes the module test plan for that class.
      5. The comment for each unit test member function describes the unit test plans for the testing function.
      6. Attached to the notes is an example that shows the rolodex model class and the companion test class.
    2. Testing directory structure
      1. Figure 1 shows the details of the testing directory structure in the context of a normal project directory (without package subdirectories).
        
        

        Figure 1: Testing directory structure.


        
        
        
      2. The variable $PLATFORM refers to the one or more subdirectories that contain platform-specific testing files (e.g., class, SUN4G, SUN4K, HP700).
      3. The contents of the testing subdirectories are as follows:
        DIRECTORY or FILE   DESCRIPTION
        ========================================================================
        *Test.java          Implementation of module testing plans.  Per the project
                            testing methodology, each testing class is a subclass of
                            the design/implementation class that it tests.
        
        
        input               Test data input files used by scripts.
        
                            These include both black-box and white-box input data,
                            as appropriate.  This subdir will be empty in cases
                            where testing is performed entirely programatically,
                            i.e., the testing scripts construct all test input data
                            dynamically within the script, rather than inputing
                            from test data files.
        
        
        output-good         Output results from the last good run of the tests.
        
                            These are results that have been confirmed to be
                            correct.  Note that these good results are platform
                            independent.  I.e., the correct results should be the
                            same across all platforms.
        
        
        output-prev-good    Previous good results, in case current results are
                            erroneously deemed good.
        
                            Note that this dir is superfluous if version control of
                            test results is properly employed.  However this
                            directory remains as a backup to avoid nasty data loss
                            in case version control has not been kept up to date.
        
        
        $PLATFORM/output    Current platform-specific output results.
        
                            These are the results produced by issuing a make
                            command in a platform-specific directory.  Note that
                            current results are maintained separately in each
                            platform-specific subdir.  This allows for the case
                            that current testing results differ across platforms.
                            I.e., an implementation may work properly on one
                            platform, but not on another.
        
        
        $PLATFORM/diffs     Differences between current and good results.
        
        
        $PLATFORM/Makefile  Makefile to compile tests, execute tests, and
                            difference current results with good results.
        
        
        $PLATFORM/.make*    Shell scripts called from the Makefile to perform
                            specific testing tasks.
        
        $PLATFORM/*.class   Test implementation object files.
        




    index | lectures | handouts | examples | doc