Assignment: Usability Evaluation


Assignment


This description addresses usability evaluations for the product or system you’re working on in your team project. It may be applicable to multiple assignments, such as usability evaluations using particular tools (e.g. Morae) or methods (e.g. eye tracking). It is to be performed in a team, although it also can consist of individual, coordinated evaluations or parts thereof.
Much of the information below is based on

Goals and Objectives


The goal of this assignment is to allow students to get practical experience with the design, planning, conduct, and aftermath of a usability evaluation for an existing system, or one that is under development.

Description


Your task in this assignment is to perform a usability evaluation of an existing application or system, or one that is currently under development. In general, it involves the following activities:
  • Goals and Objectives
  • Selection of Evaluation Method
  • Finding and Selecting Participants
  • Development of the Evaluation Plan
  • Setting up the Testing Environment
  • Preparation of the Evaluation Materials
  • Conducting the Evaluation Session
  • Debriefing of Participants and Observers
  • Analysis of Data and Observations
  • Recommendations for Product Improvements
  • Reporting the Findings
Further details on the activities are given below.
Not all usability evaluation methods require all of the above steps. At least one of the evaluation methods you use must include experiments with participants from outside this class. Typically this is a direct observation experiment where participants work through a given task, and your team observes them as they do this. It should be related to the team project and the collaboration with your team’s outside partner. If there are strong reasons why this isn’t possible, you need to talk to me about alternatives.

Goals and Objectives of the Evaluation

Here, you’re identifying and setting the priorities for the outcomes of the evaluation. Frequently these are related to the experiences users have with the product under evaluation.
A common strategy is to use
Whitney Quesenbery’s 5Es as a starting point: Efficient, effective, engaging, error tolerant, easy to learn.
Your goals and objectives frequently are also affected by practical constraints, such as time constraints and the availability of resources (funding, facilities, equipment, participants, evaluators and moderators).

Selection of the Evaluation Method(s)

In general, you should try to identify the evaluation method that is likely to give you the best results, according to the goals and objectives identified earlier. For the selection of the method, practical constraints, however, frequently impose limitations.
Here are a some frequently used methods to consider:
  • surveys and questionnaires, focus groups, ethnographic research
  • collaborative prototyping, participatory design, card sorting
  • walk-throughs and expert or heuristic evaluations
  • usability testing
For the different phases of the project, you may consider different evaluation methods. In our context, a common strategy is to use light-weight methods like surveys and focus groups initially to determine user needs and requirements, and then to do more formal usability experiments once mockups or prototypes are available. If you start with an existing product, you may consider an initial usability evaluation to establish a base line for later revisions or complete redesigns.
If you’re considering alternative methods not listed here, check with your instructor before you make a decision.

Participants

In our environment it is often straightforward to work with students as participants. Sometimes I can make arrangements with instructors of other classes where those students serve as participants for our evaluations. This is not always possible and sometimes not appropriate, e.g. when students are not representative of the intended user population. While I try to assist you with this, ultimately it is your responsibility to find a sufficient number of suitable participants. To determine the suitability of participants, it is helpful to have “personas” that act as proxies for certain types of users. In a university setting, for example, there may be “Prof. S. Ohr ” as a faculty proxy, or “Dr. Dean Ceng” as upper-level administrator, as well as a number of student personas. From the personas, you can derive selection criteria for each group of participants, which can be applied through a screening questionnaire.
The number of participants varies greatly, ranging from a few for participatory design, to hundreds or even thousands for Web-based questionnaires.
On the practical side, make sure that participants know when and where the evaluation takes place. You may also have to use incentives to attract participants. Our resources here are of course very limited, but for college students, free pizza still seems to work ...

Testing and Evaluation Plan


This can range from relatively informal (e.g. the notes of the planning meeting), to a formal one that is subject to approval by others (e.g. group leader, UX department, Human Subjects Committee). Specifically here at Cal Poly, you will have to get approval for any experiments involving human subjects whose results are intended for publication (Senior Projects and Theses are considered publications). Most of our experiments are strictly for educational purposes, but if yours is part of a senior project, Master’s Thesis, or a sponsored project, you’ll have to get it approved. See the Human Subjects Research at Cal Poly Web page for additional information. That page also includes sample informed consent forms.
Below is an extensive outline of a formal evaluation plan based on Carol Barnum’s
Usability Testing Essentials: Ready, Set...Test! (see above).
  • Title Page
  • Table of Contents
  • Executive Summary
  • Problem Statement and Test Objectives
  • Methodology
  • User Profiles
  • Participant Incentive
  • Screeners (for user profiles to select participants that match the profile)
  • Task List
  • Scenarios
  • Evaluation Methods: A brief description of the evaluation methods (e.g. interaction recording, gaze tracking, expert evaluation), and why they are used for your project. Simply stating that it's required by the assignment isn't sufficient.
  • Test Environment and Equipment
  • Deliverables
  • Appendices
  • I suggest to use this outline for your plans, even if it seems like overkill. It is also similar to the suggested structure of a report you'll have to write for this assignment (see below)

Testing Environment


The main decision here is about the location where the evaluation will take place. The two most common options are a usability lab (such as our HCI Lab, 14-257), or the user’s site. This decision then drives the selection of equipment, artifacts, and tools used in the evaluation. At this point, you should also think about personnel required, such as moderators (who are the main contacts for the participants), observers (via a one-way mirror or video, audio or computer recording equipment), and assistants. Frequent responsibilities are data gathering, note taking, timekeeping, product expertise (how to use the product under evaluation), technical experts (for equipment and tools used), and greeters who are responsible for making participants at ease. In our context, all of these roles are typically performed by team members.

Evaluation Materials


Here you identify all the materials used during the evaluation, including the products or artifacts used.
  • Observer Guidelines
  • Orientation Script
  • Background Questionnaire
  • Data Collection Tools (data loggers, online data collection, user-generated data, manually generated data)
  • Forms for non-disclosure, informed consent, and recording waivers
  • Pre-test Questionnaires, Interviews
  • Products, Prototypes, Artifacts
  • Task Scenarios
  • Training Materials
  • Post-test Questionnaires, Interviews
  • Debriefing Guide
Depending on the evaluation method chosen, not all of the above may be required. You should consider this list for guidance, and only omit what you’re confident won't

Conducting the Evaluation Session


Most aspects of this should already have been addressed in the evaluation plan. You can use the following as a check list, based on Jeffrey Rubin, Dana Chisnell: Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests:
A week before the test:
  • Do a pilot test
  • Brief people involved in the experiments (researchers, observers, technicians, support personnel)
  • Make sure equipment and test environment are available and in working order
  • Initiate production of printed test support materials (product training, forms, paper questionnaires)
  • Check participant status: Number, profiles, date and location of the experiment

A day before the test:
  • Check equipment and test environment again
  • Obtain printed test materials; collate for easy distribution to participants
  • Check the product: Is it working? Is it consistent with the support materials?
  • Check participant status: Last-minute cancellations, changes

Day of the test (before the experiment):
  • Brief people involved in the experiments (researchers, observers, technicians, support personnel)
  • Check equipment and test environment again
  • Check the product again
  • Check availability of support materials
Day of the test (during the experiment, for each participant):
  • Greet participant
  • Take care of preliminary documents (informed consent; permission to record; non-disclosure)
  • Read the orientation script
  • Apply pre-test questionnaires
  • Check with personnel involved in the experiment
  • Start recording
  • Perform pre-test training
  • Share task scenario with the participant
  • Signal the start of the experiment to the participant and personnel
  • Observe the participant, annotate observations, collect data during the experiment
  • Apply post-test questionnaires
  • Debrief the participant
  • Close the session, thank the participant
  • Collect support material (scenario, training, notes)
  • Debrief with the personnel, especially observers
  • Prepare for the next participant

Debriefing of Participants and Observers


In the debriefing part, you review the actions and behavior of the participant during the usability evaluation. The debriefing is usually held immediately after the evaluation part. In the actual evaluation part, you may discover some problems with the product, while in the debriefing part you may be able to obtain additional information from the participant about the causes of the problems, their relevance, the motives of the participants, and possible points of confusion they encountered. Debriefing often is your last chance to communicate directly with participants; afterwards they will be gone, and may be more difficult to reach.
The following debriefing guidelines are based on
Jeffrey Rubin, Dana Chisnell: Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests:
  • Gather your thoughts while the participant fills out post-test questionnaires.
  • Briefly review the post-test questionnaire, and clarify answers that draw your attention.
  • Give the participant a chance to voice opinions or concerns about the evaluation.
  • Start your questions with general, high-level issues, then move to more specific issues.
  • Review interesting parts of the post-test questionnaire with the participant.
  • Focus on understanding problems observed during the evaluation; don’t try to solve those problems.
  • Go through your set of questions before inviting the participants or other observers to engage in an open discussion.
  • If appropriate, address possible options for further contact with the participants.
Rubin and Chisnell also list a number of advanced debriefing techniques, such as replaying the test (“retrospective review”), reviewing design alternative, or playing the devils advocate if it may be interesting to challenge the intellectual positions of participants.
In addition to debriefing participants, it can be valuable to debrief other observers (if there are any). Often this is done in an informal manner between evaluation sessions. It can be helpful to get immediate feedback from others directly involved in the activity, and may also keep them more engaged in the process. In addition to discussing issues, you should also consider writing them down, sometimes it helps to use sticky notes for this that you can later rearrange easily. However, this should typically be a relatively short activity; a more formal discussion often takes place once all evaluation sessions have been completed. On the other hand, not having the quick, immediate debriefing can lead to something like “participant overload”, where it becomes increasingly difficult for observers to keep track of what happened when and with which participants.

Analysis of Data and Observations


During the evaluations you typically collect a significant amount of data, and now it is time to look at this wealth of data. Rubin and Chisnell split this up into three steps:
  1. Compile data: Collect and organize the various data sets in such a way that you can identify patterns. It may also involve identifying and dealing with outliers, normalizing data, or perform other transformations that make the next steps easier. This step of course is easier if you planned the data collection aspects well in advance.
  2. Summarize data: Often with the help of computer-based tools such as spreadsheets, Pivot tables, charts, or specialized programs, you create summaries that assemble detailed data collections into meaningful categories. Frequent performance data summaries concern task accuracy (percentage of participants performing tasks successfully/with assistance/unsuccessfully, or within a given time period), task timing (mean/median time to complete, range and standard deviation of completion times), and participant preferences.
  3. Analyze data: The goal of the analysis phase is to gain a better understanding of the usability evaluation experiment. This can range from the use of statistical techniques to a prioritization of issues with the product. Frequent aspects are the identification of tasks that did not meet the success criteria, and attempts to identify errors that caused such failures. It may also be useful to analyze differences between participant groups or product versions.

Recommendations for Product Improvements


The desired outcome of the analysis phase often is a set of recommendations for product improvements. Such recommendations are often based on “findings”, which refer to problems or issues identified through the analysis. Findings are often formulated in short, simple phrases that try to capture the most important aspects, and also displayed prominently in the usability evaluation report. Sometimes the goal of a usability evaluation is to identify problems with a product, and it may be sufficient for the evaluation team to report these problems. If appropriate, the evaluation team may be able to formulate recommendations to eliminate or minimize those problems, although this can also be the responsibility of the design or development team. In our circumstances, one single team usually is responsible for all of the above aspects, and in general, you should try to come up with recommendations for improvements.

Evaluation Report


It is tempting to jump right into re-design and further development efforts after conducting usability evaluations, especially if a team is responsible for all of this. However, there are good reasons to document your findings and recommendations in a report: It captures what you learned, what the findings mean, and what you debated or decided to do in response. The report can be used to communicate the findings within and beyond the team, and it serves as a repository of relevant knowledge that you may consult later. In some cases, you may create a preliminary report relatively early and with limited distribution, and a more formal one later, with a wider distribution.
  • Cover Page: title, date, authors, recipients, possibly an image of the product
  • Cover Memo (this can also be a separate letter or email message)
  • Executive Summary: brief overview of the system (product, project) under evaluation; purpose of the evaluation, main findings, recommendations, important action items. This is intended for non-technical stakeholders who are unlikely to read the whole document. It should contain an overview of the project, a brief description of the evaluation methods and experiments, and a discussion of the main findings, leading to the recommendations given by the team. It's usually about one page long.
  • Table of Contents: often auto-generated from headings; should include all headings after the Table of Contents itself, including appendices; refers to page numbers, with hyperlinks to click on in electronic versions
  • List of Illustrations/Tables: similar to ToC
  • Introduction and Background: this is usually a narrative, not a list; often it is helpful to include a few sentences about the overall project, then information about the specific evaluation(s) described in the report
  • Test Goals and Objectives: a list of goals/objectives with keywords or phrases that identify each; if necessary, give a brief explanation
  • Methodology: Identify the methods (e.e., experiment with observation, eye tracking, interview, questionnaire) and tools (e.g., Morae, Camtasia, Web camera, eye tracking device) you’re using, and how they support the evaluation towards the test goals and objectives. A brief description of the evaluation methods and why they are used for your project. Simply stating that it's required by the assignment isn't sufficient.
  • Metrics: a list of measures with keywords or phrases that identify each measure; if necessary, give a brief explanation
  • Participants: describe the number, categories, and other relevant aspects of the participants in the experiment; discuss the match of the actual participants with the target user groups
  • Tasks and Scenarios: Describe the overall plans for the activities, and then give a list of the specific actions you expect the participants to perform; for more complex ones, a flowchart or similar diagram also works well
  • Findings and Test Results: this can be another list, or a series of sub-sections or paragraphs; either way, a short phrase serves as an identifier, followed by a more detailed description; this section is suitable for the discussion of technical details
  • Post-task, Post-Test Results
  • Recommendations: a list of recommendations with short phrases as headers and a brief description usually works well here; put a short “intro” and “outro” in front of and after the list; this is often targeted at nontechnical stakeholders, and is usually short
  • Next Steps: this can be a narrative or a list, and it is also usually intended for non-technical stakeholders, often at management levels
  • Appendices: usually labeled Appendix A, B, C, etc.; often contains logs, questionnaires, protocols, scripts,
You may have noticed that this structure is quite similar to the one given for the evaluation plan above, and indeed it is often useful to refer to the evaluation plan when composing the report.
Since the writing of reports is often not a very popular activity among usability testers, designers, and developers, there are tools supporting the generation of reports. Morae, for example, offers significant assistance with reports, and it is fairly easy to generate nice-looking, voluminous reports. Keep in mind, however, that ultimately it is the content of the report that counts, not the number of pages or its appearance. On the other hand, a sloppy, poorly structured, ill-formatted report riddled with typos, grammatical issues, errors and omissions will not exactly inspire confidence in the findings and recommendations it contains.
The Web site accompanying Barnum’s book has an
example of a Usability Test Report for a usability evaluation of the Southern Polytechnic State University web site
(
www.spsu.edu). It looks like an actual report; I assume that it is the outcome of a consulting activity that SPSU paid for, and I don’t expect you to produce something that is as long.

Additional Resources

For a general overview on User-Centered Design and Human-Computer Interaction, check the Web sites for the respective courses, CSC 484-W17 and CSC 486-W16. Since we don’t have the resources to offer CSC 486 every year, the course CSC 581 Computer Support for Knowledge Management can also be used for the UCD/HCI Sequence, and has related material. In S13 and F15, I taught a graduate level UCD/HCI course for the first times, CSC 570-F15 Special Topics in Computer Science - Human-Computer Interaction; this assignment specification is a revised version from that course, and there is another assignment with a focus on usability evaluation tools for mobile devices; the results are accessible via the Semantic Media Wiki UCD Tools Page.
Specifically for Usability Evaluation, check the two reference books listed in the
syllabus and at the top of this page. Both have companion Web sites with samples of forms, reports, checklists, templates, and pointers to other publications.

Submission and Deadlines


Note: I’ve modified the submission requirements below to reflect the option of submitting a single document for the evaluations that refer to the same system. If you’ve already prepared documents according to the older scheme, there’s no need to make changes.


You need to submit at least one document, structured according to the list under “Evaluation Report above”. For some of the list items, certain aspects may not be relevant. In that case, please state that and explain why, rather than just leaving it out. Unless they are already included in the report, you should also submit copies of documents used in the evaluation, such as the evaluation plan and consent form. There is no need to submit data sets you collected. Especially if you’re collecting data that may be personal, you should follow the policies that you identified in your evaluation plan and state in the informed consent form about privacy, and the collection and destruction of personal data. For additional information you can check the two reference books listed above and in the
syllabus. Both have companion Web sites with samples of forms, reports, checklists, templates, and pointers to other publications.
The deadlines are listed in the schedule; there is some flexibility, but you need to talk to me in advance if you have problems meeting them. You need to submit a draft version of the report by the deadline of A3, and the final version by the deadline of A5 as specified in the schedule.
Place a link to the document in the respective cell on the Team Project Overview Web page; please check that your documents are accessible to anybody who has the link.

Grading Criteria


The grading criteria will be based on the three documents submitted, and their structures as outlined above. The emphasis will be on the evaluation of the data, the overall assessment of the outcomes, and the recommendations derived from that. I will use a grading rubric on PolyLearn; the detailed description of the rubric will be on Google docs, but the grades will be recorded on PolyLearn.