CSC 484 Lecture Notes Week 8
The DECIDE Evaluation Framework;
Usability Testing and Field Studies;
Recap of Analytic Evaluation 
- 
Relevant reading -- Chapters 13-15 of the textbook.
 - 
Introduction to Chapter 13 (Section 13.1).
- 
This chapter is largely a recapitulation of preceding chapters.
 - 
It provides a high-level organizational framework for evaluation, called
"DECIDE".
 - 
In presenting DECIDE, the authors provide a few new pieces of information, not
explicitly mentioned in preceding chapters.
 - 
You may find the DECIDE framework helpful in organizing the evaluation part of
your final project report.
 
 - 
Definition of DECIDE (Section 13.2).
- 
Again, the purpose of the framework is to provide some high-level
organizational guidance for evaluation of interaction designs.
 - 
DECIDE has six steps:
- 
Determine the goals.
 - 
Explore the questions.
 - 
Choose the evaluation approach and methods.
 - 
Identify the practical issues.
 - 
Decide how to deal with the ethical issues.
 - 
Evaluate, analyze, interpret and present the data.
 
 
 - 
Determine the goals (Section 13.2.1).
- 
We have discussed this amply, particularly in last weeks notes on chapters 7
and 8 of the book.
 
 - 
Explore the questions (Section 13.2.2).
- 
We also discussed this last week, particularly in the context of properly
framing questions on a questionnaire.
 - 
An important point to reiterate is that of not forgetting to ask
fundamental questions of the subjects in a usability study.
- 
In the heat of project development, a team may get so fully immersed in the
work as to lose sight of basic questions to ask.
 - 
For example, "Would you use this product?", "If so, how often?".
 
 - 
And of course such questions should be asked on questionnaire in an analyzable
form, as in
"I would use this product for its intended purpose.  |_|_ Strongly
disagree   ...    |_|_ Strongly agree"
"I would use this product    |_|_ daily    |_|_ weekly    |_|_ monthly    |_|_
occasionally"
 
 - 
Choosing the appropriate methods (Section 13.2.3).
- 
Been here, done this.
 
 - 
Identify the practical issues (Section 13.2.4).
- 
The major practical issue for 484 -- DO DRESS REHEARSAL of your usability
study.
- 
I.e., each team member act independently as a study participant.
 - 
If possible, enlist the help of people you know outside the class, by whatever
means of cajolation you have at your disposal.
 
 - 
Practical Users issues.
- 
Page 631 notes the following bits of information about task length in usability
studies.
- 
10 minutes is too short, 2 hours two long.
 - 
This means the 50-minute time slots we have for 484 studies are just about
right.
 
 - 
Page 631 also recounts the dilemma of to how study people's behavior without
influencing it.
 - 
The lesson in particular for 484 studies is leave study participants alone as
much as possible, while they're performing the study tasks.
 
 - 
Practical issues of facilities and scheduling.
- 
Plan the logistics of your usability study thoroughly.
 - 
Think thorough the room layout and how participants will be placed.
 - 
Plan all equipment placement.
 - 
Assign study monitoring duties to team members.
 - 
Determine how the questionnaires will be administered and collected.
 
 - 
Practical issues of expertise.
- 
For 484 study data analysis, take advantage of Heather Smith's statistical
expertise.
 - 
She has regular open office hours, and is available by appointment.
 - 
By all accounts, her advise is professional and very helpful.
 
 
 - 
Decide how to deal with the ethical issues (Section 13.2.5).
- 
You've had a class in this; apply what you learned in 300 to your work in 484.
 - 
Activity 13.6 describes practice that you should
follow in your 484 studies, particularly the 2d3d team:
- 
Assign each study participant a code number.
 - 
Have them put their number, not name, on the questionnaire, and any other data
you collect from them.
 - 
Keep the name-to-code correlation information separate from the collected data.
 
 - 
As noted in the Milestone 3 writeup, you are required to have controlled-study
participants sign an informed consent form.
- 
For your fellow 484 students, this is an academic exercise, but worth doing
explicitly to focus your attention on its necessity in real studies.
 - 
It's in fact necessary for the 2d3d study, since you're using study
participants from outside of class.
 - 
A consent form is not necessary for field-study
interviews, in particular those done by the swat team.
 
 - 
There is a summary of ethical points to consider on Pages 637-638 of the book:
- 
Tell participants the study goals, how long it will take, how the data will
reported and analyzed; offer them a copy of the final report (most likely in
the form of a webpage link).
 - 
Explain that any revealed or discovered personal information will be kept
confidential.
 - 
Make sure participants know they are free to stop at any time.
 - 
Consider the appropriateness of any offered incentives.
 - 
Do not report quotes that could inadvertently reveal the identity of
participants; use numbers or fictitious names in any explicit quotes.
 - 
Always ask permission to quote a participant; offer a pre-release copy of the
report to those quoted, so they can check that they were quoted correctly.
 
 
 - 
Evaluate, interpret, and present the data (Section 13.2.6).
- 
Your 484 studies are simple academic exercises, most likely not subject to
scrutiny by the outside world.
 - 
However, it is worth considering the following criteria for data evaluation and
interpretation.
- 
Reliability -- bottom line, is it reproducible by
others?
 - 
Validity -- did you prove what you set out to prove,
i.e., did you validate any claims that you made at the outset of the study?
 - 
Biases -- did you do your utmost not to let biases
corrupt the study, the data gathering, its analysis, and interpretation?
 - 
Scope -- how much can your findings be generalized,
if at all?
 - 
Ecological validity -- does the environment in which
the study is conducted effect the results, e.g., do you get better results if
you fed the subjects well?
 
 - 
Regarding the last point on "ecological validity", the Wikipedia article on the
Hawthorne effect is one of the more cogent articles I've read at that
venue; I recommend it.
 
 - 
Introduction to Chapter 14, on Usability Testing and Field Studies
(Section 14.1)
- 
The primary focus of this chapter is on usability studies of finished products.
 - 
As such, many of the specifics do not apply directly to the 484 studies that
are based on prototypes of proposed products.
 - 
Nevertheless, there is some useful information for 484 teams who will be
conducting studies that involve some form of quantitative analysis,
particularly 2d3d mobility, and touchten.
 
 - 
Usability testing (Section 14.2).
- 
As presented in previous notes and chapters, the key components are
- 
user tests
 - 
satisfaction questionnaires
 - 
interviews
 
 - 
For fully quantifiable user tests, the following types of data are gathered:
- 
time to complete a task
 - 
time to complete, after being away from the product for a specified time
 - 
number of errors per task
 - 
number or errors per unit of time
 - 
number of navigations to online help or manuals
 - 
number of users making a particular error
 - 
number of users completing a task successfully
 
 - 
The number of participants in usability studies varies.
- 
For the type of quantitative testing described in Chapter 14, the book cites a
1999 book by Dumas and Redish that considers 5-12 users acceptable.
 - 
On his web page, Nielson basically concurs with this, saying 5-15 users is
sufficient (see
"Why You Only Need to Test With 5 Users"
at www.useit.com/alertbox/20000319.html).
 - 
Both these sources are talking about usability studies that focus on specific
features, using a number of small tests.
 - 
For studies involving statistical analysis, the number of participating
subjects depends on the type of analyses to be performed, and the degree of
statistical certainty to be achieved.
- 
Generally, statistical analyses require a sample size of more than 15 subjects.
 - 
There are well-known formulae for calculating experimental sample sizes.
 - 
Russ Length's web page has a Java applet, and links to a number of other
related sites; see
www.stat.uiowa.edu/~rlenth/Power
 
 
 - 
The venues of usability studies vary widely.
- 
Large companies, like Microsoft, have large dedicated spaces, equipped to the
hilt with recording equipment, sound proofing, and any other conceivable
experimental amenity.
 - 
On the other end of the spectrum is the "lab-in-a-suitcase" approach, where
studiers travel to users' sites to conduct their work.
 - 
There is also the remote monitoring approach, where study participants work in
their own environment, with those conducting the study monitoring and
collecting data remotely.
 
 
 - 
Usability testing of a large website (Section 14.2.1).
- 
The book walks through a concrete example, of interest, but not direct
relevance to your 484 studies.
 - 
The structure of the presentation is a good review of the steps involved in a
usability study, as have been outlined in chapters 7, 8, and 13:
- 
Establishing goals and questions
 - 
Selection of participants
 - 
Development of the tasks
 - 
The test procedure
 - 
Data collection
 
 
 - 
Conducting experiments in usability testing (Section 14.2.2).
- 
A usability study is sometimes carried out in the form of a scientific
experiment.
 - 
This involves testing a specific hypothesis.
 - 
A basic hypothesis is stated in terms of two variables.
 - 
E.g., "Reading text displayed in 12-point Helvetica font is faster than
Reading text displayed in 12-point Times New Roman."
 - 
The variables are classified as dependent or independent.
- 
The value of an independent variable is selected in advance by the
experimenter, e.g., the font type in the preceding example.
 - 
The value of a dependent variable is measured in the experiment, e.g.,
the time taken to read the text.
 
 - 
An hypothesis can be stated in null and alternative forms.
- 
The null hypothesis states the opposite of an experimenter's proposed, a.k.a.,
alternative hypothesis.
 - 
In the preceding example, the null hypothesis is that there is no difference
between reading times with the two fonts.
 - 
Statistically, the null hypothesis provides a baseline for measuring the
significance of observed findings.
 - 
I.e., significance is defined in terms of how often observed data support the
null hypothesis.
 - 
If this happens rarely, it allows the experimenter to invoke a form of proof-
by-contradiction.
 - 
I.e., if gathered evidence can be shown rarely to support the null hypothesis,
then the alternative hypothesis is assumed true, within the determined degree
of statistical significance.
 
 - 
HCI experiments very often involve multiple variables.
- 
There can be more than one dependent variable, and more than one independent
variable.
 - 
In HCI experiments, there are very typically other unmeasured variables in an
experiment, the values of which must be kept constant.
 - 
E.g., font color and screen resolution are such variables in the preceding
example.
 
 - 
The significant challenges of experimental design are to
- 
identify all the variables that may effect the experimental outcome,
 - 
set up the experimental conditions so that the values of all unmeasured
variables remained fixed.
 
 - 
The book provides further details and examples of the statistical design of
some simple HCI experiments.
 - 
Several of the 484 research readings have larger-scale examples of statistical
experimental design.
 - 
And there are many web and textbook resources on the subject.
 
 - 
Field studies (Section 14.3).
- 
This section is largely a recap of material presented in preceding chapters.
 - 
Here are some important points to remember about field studies, most notablly
for the sultans of swat:
- 
Tell participants what they will be asked to do, and about how long it will
take.
 - 
Have a plan of the questions you want to ask, but be prepared to be flexible as
circumstances dictate.
 - 
Let the participants "do their own thing" during the specific period of
prototype usage.
 - 
Observe participants as unobtrusively a possible, but do observe in
some form.
 - 
Record the sessions with notes, and other forms of recording that the
participants consider acceptable.
 
 - 
The larger-scale examples in Section 14.3 are not directly applicable to 484.
 - 
Neither are the theoretical frameworks used -- activity theory and
semiotic engineering, but worth a read.
 
 - 
Introduction to Chapter 15, on Analytic Evaluation (Section
15.1).
- 
This topic was the subject 484 Assignment 1, for which you already read Section
15.2.
 - 
The gist of analytic evaluation is that it does not involve actual end
users, where "actual" means real people who will use a product for its intended
purpose.
 - 
Rather, analytic evaluation is conducted by assumed product and domain experts,
using inspections or models to evaluate a product.
 
 - 
Heuristic Evaluation (Section 15.2) -- The subject of
Assignment 1.
 - 
Inspection Walkthroughs (Section 15.3).
- 
These are typically performed by a team composed developers, usability experts,
and possibly users or user representatives.
 - 
Per Nielson, "Cognitive walkthroughs involve simulating a user's problem-
solving process at each step in the human-computer dialog, checking to see if
the user's goals and memory for actions can be assumed to lead to the next
correct action."
 - 
Steps of a cognitive walkthrough:
- 
Identify the characteristics of typical users.
- 
Describe product or prototype to users.
 - 
Define a specific task for users to complete.
 
 - 
Convene designers and usability experts.
 - 
Walk through each action of the task, asking
- 
Will the correct action be evident?
 - 
Will the user notice of the correct action is available?
 - 
Will the user properly understand the response to their action, correct or
incorrect?
 
 - 
Record important information:
- 
Ideas about what causes user problems, and why.
 - 
Notes about design changes that may be needed, and other potentially relevant
side issues.
 - 
A summary of the results.
 
 - 
Revise the design to fix the problems.
 
 - 
The book makes the important observation that such walkthroughs should be
egoless.
- 
I.e., designers should not try to defend their designs when it's clear there
are problems.
 - 
Usability experts should lose the "we're inherently more knowledgeable"
attitude.
 
 - 
The book also discusses an alternate form of  pluralistic
walkthroughs, which differ from the preceding cognitive walkthroughs in
the following respects:
- 
Usage scenarios are developed as part of the process, not in advance.
 - 
The analysis is involves more collaborative discussion than directed user
observation.
 
 
 - 
Predictive models (Section 15.4).
- 
Inspections are conducted without users, and without experts role-playing
users.
 - 
Rather, the inspection uses a formulaic model that predicts user behavior.
 
 - 
GOMS models (Section 15.4.1)
- 
It's an acronym for
- 
Goals -- what the user wants to achieve
 - 
Operators -- user cognition and actions needed to achieve goals
 - 
Methods -- learned procedures for accomplishing goals
 - 
Selection rules -- used to choose which of multiple methods to select
 
 - 
GOMS is a generic model, that does not predict specific user performance.
 
 - 
The keystroke-level model (Section 1.5.2)
- 
In contrast to generic GOMS, this model provides actual numeric predictions of
user performance.
 - 
The model is based on the analysis of many empirical studies of actual user
performance.
 - 
The table on Page 709 lists some times for core tasks, including
- 
pressing a key
 - 
pointing with a mouse
 - 
clicking the mouse
 - 
homing hands on the keyboard
 - 
drawing a line with a mouse
 - 
making a decision to do something (can vary significantly)
 - 
system response time (can vary significantly)
 
 - 
These numbers are used to add up the amount of time a particular task takes
using a particular interface.
 - 
The book provides a couple examples.
 
 - 
Benefits (and limitations) of using GOMS (Section 15.4.3).
- 
GOMS-style analyses do provides hard, quantitative data.
 - 
Such analyses can in some cases lead to significant interaction design
improvements.
 - 
However,
- 
GOMS analyses are limited to a relatively small set of routine tasks.
 - 
Do not model user errors.
 - 
Nor do they model other cognitive, social, or environmental factors, such as
- 
fatigue
 - 
distractions
 - 
multi-tasking
 - 
learning effects
 
 
 
 - 
Fitts' law (Section 15.4.4).
- 
Published in 1954 by Paul Fitts, it predicts the time it takes to reach a
target using a pointing device.
 - 
Can be used to determine where to place interface widgets on a display, and how
big they should be.
 - 
As the book's nutshell description says -- the bigger the target,  the easier
and quicker it is to reach.
 - 
Specific HCI results based on Fitts:
- 
Don't have lots of tiny buttons crammed together.
 - 
Things in the four corners of the screen are easy to reach (so why don't more
UIs take advantage of this?)
 
 - 
Some good interaction design results have been obtained by applying Fitts.
 - 
And it's still with us, as evidenced by a full session in the 2008 SIGCHI
entitled "Fitt's Law Lives".
 
 
index
|
lectures
|
assignments
|
projects
|
handouts
|
solutions
|
examples
|
documentation
|
bin
|
grades