CSC 484 Lecture Notes Week 8
The DECIDE Evaluation Framework;
Usability Testing and Field Studies;
Recap of Analytic Evaluation
-
Relevant reading -- Chapters 13-15 of the textbook.
-
Introduction to Chapter 13 (Section 13.1).
-
This chapter is largely a recapitulation of preceding chapters.
-
It provides a high-level organizational framework for evaluation, called
"DECIDE".
-
In presenting DECIDE, the authors provide a few new pieces of information, not
explicitly mentioned in preceding chapters.
-
You may find the DECIDE framework helpful in organizing the evaluation part of
your final project report.
-
Definition of DECIDE (Section 13.2).
-
Again, the purpose of the framework is to provide some high-level
organizational guidance for evaluation of interaction designs.
-
DECIDE has six steps:
-
Determine the goals.
-
Explore the questions.
-
Choose the evaluation approach and methods.
-
Identify the practical issues.
-
Decide how to deal with the ethical issues.
-
Evaluate, analyze, interpret and present the data.
-
Determine the goals (Section 13.2.1).
-
We have discussed this amply, particularly in last weeks notes on chapters 7
and 8 of the book.
-
Explore the questions (Section 13.2.2).
-
We also discussed this last week, particularly in the context of properly
framing questions on a questionnaire.
-
An important point to reiterate is that of not forgetting to ask
fundamental questions of the subjects in a usability study.
-
In the heat of project development, a team may get so fully immersed in the
work as to lose sight of basic questions to ask.
-
For example, "Would you use this product?", "If so, how often?".
-
And of course such questions should be asked on questionnaire in an analyzable
form, as in
"I would use this product for its intended purpose. |_|_ Strongly
disagree ... |_|_ Strongly agree"
"I would use this product |_|_ daily |_|_ weekly |_|_ monthly |_|_
occasionally"
-
Choosing the appropriate methods (Section 13.2.3).
-
Been here, done this.
-
Identify the practical issues (Section 13.2.4).
-
The major practical issue for 484 -- DO DRESS REHEARSAL of your usability
study.
-
I.e., each team member act independently as a study participant.
-
If possible, enlist the help of people you know outside the class, by whatever
means of cajolation you have at your disposal.
-
Practical Users issues.
-
Page 631 notes the following bits of information about task length in usability
studies.
-
10 minutes is too short, 2 hours two long.
-
This means the 50-minute time slots we have for 484 studies are just about
right.
-
Page 631 also recounts the dilemma of to how study people's behavior without
influencing it.
-
The lesson in particular for 484 studies is leave study participants alone as
much as possible, while they're performing the study tasks.
-
Practical issues of facilities and scheduling.
-
Plan the logistics of your usability study thoroughly.
-
Think thorough the room layout and how participants will be placed.
-
Plan all equipment placement.
-
Assign study monitoring duties to team members.
-
Determine how the questionnaires will be administered and collected.
-
Practical issues of expertise.
-
For 484 study data analysis, take advantage of Heather Smith's statistical
expertise.
-
She has regular open office hours, and is available by appointment.
-
By all accounts, her advise is professional and very helpful.
-
Decide how to deal with the ethical issues (Section 13.2.5).
-
You've had a class in this; apply what you learned in 300 to your work in 484.
-
Activity 13.6 describes practice that you should
follow in your 484 studies, particularly the 2d3d team:
-
Assign each study participant a code number.
-
Have them put their number, not name, on the questionnaire, and any other data
you collect from them.
-
Keep the name-to-code correlation information separate from the collected data.
-
As noted in the Milestone 3 writeup, you are required to have controlled-study
participants sign an informed consent form.
-
For your fellow 484 students, this is an academic exercise, but worth doing
explicitly to focus your attention on its necessity in real studies.
-
It's in fact necessary for the 2d3d study, since you're using study
participants from outside of class.
-
A consent form is not necessary for field-study
interviews, in particular those done by the swat team.
-
There is a summary of ethical points to consider on Pages 637-638 of the book:
-
Tell participants the study goals, how long it will take, how the data will
reported and analyzed; offer them a copy of the final report (most likely in
the form of a webpage link).
-
Explain that any revealed or discovered personal information will be kept
confidential.
-
Make sure participants know they are free to stop at any time.
-
Consider the appropriateness of any offered incentives.
-
Do not report quotes that could inadvertently reveal the identity of
participants; use numbers or fictitious names in any explicit quotes.
-
Always ask permission to quote a participant; offer a pre-release copy of the
report to those quoted, so they can check that they were quoted correctly.
-
Evaluate, interpret, and present the data (Section 13.2.6).
-
Your 484 studies are simple academic exercises, most likely not subject to
scrutiny by the outside world.
-
However, it is worth considering the following criteria for data evaluation and
interpretation.
-
Reliability -- bottom line, is it reproducible by
others?
-
Validity -- did you prove what you set out to prove,
i.e., did you validate any claims that you made at the outset of the study?
-
Biases -- did you do your utmost not to let biases
corrupt the study, the data gathering, its analysis, and interpretation?
-
Scope -- how much can your findings be generalized,
if at all?
-
Ecological validity -- does the environment in which
the study is conducted effect the results, e.g., do you get better results if
you fed the subjects well?
-
Regarding the last point on "ecological validity", the Wikipedia article on the
Hawthorne effect is one of the more cogent articles I've read at that
venue; I recommend it.
-
Introduction to Chapter 14, on Usability Testing and Field Studies
(Section 14.1)
-
The primary focus of this chapter is on usability studies of finished products.
-
As such, many of the specifics do not apply directly to the 484 studies that
are based on prototypes of proposed products.
-
Nevertheless, there is some useful information for 484 teams who will be
conducting studies that involve some form of quantitative analysis,
particularly 2d3d mobility, and touchten.
-
Usability testing (Section 14.2).
-
As presented in previous notes and chapters, the key components are
-
user tests
-
satisfaction questionnaires
-
interviews
-
For fully quantifiable user tests, the following types of data are gathered:
-
time to complete a task
-
time to complete, after being away from the product for a specified time
-
number of errors per task
-
number or errors per unit of time
-
number of navigations to online help or manuals
-
number of users making a particular error
-
number of users completing a task successfully
-
The number of participants in usability studies varies.
-
For the type of quantitative testing described in Chapter 14, the book cites a
1999 book by Dumas and Redish that considers 5-12 users acceptable.
-
On his web page, Nielson basically concurs with this, saying 5-15 users is
sufficient (see
"Why You Only Need to Test With 5 Users"
at www.useit.com/alertbox/20000319.html).
-
Both these sources are talking about usability studies that focus on specific
features, using a number of small tests.
-
For studies involving statistical analysis, the number of participating
subjects depends on the type of analyses to be performed, and the degree of
statistical certainty to be achieved.
-
Generally, statistical analyses require a sample size of more than 15 subjects.
-
There are well-known formulae for calculating experimental sample sizes.
-
Russ Length's web page has a Java applet, and links to a number of other
related sites; see
www.stat.uiowa.edu/~rlenth/Power
-
The venues of usability studies vary widely.
-
Large companies, like Microsoft, have large dedicated spaces, equipped to the
hilt with recording equipment, sound proofing, and any other conceivable
experimental amenity.
-
On the other end of the spectrum is the "lab-in-a-suitcase" approach, where
studiers travel to users' sites to conduct their work.
-
There is also the remote monitoring approach, where study participants work in
their own environment, with those conducting the study monitoring and
collecting data remotely.
-
Usability testing of a large website (Section 14.2.1).
-
The book walks through a concrete example, of interest, but not direct
relevance to your 484 studies.
-
The structure of the presentation is a good review of the steps involved in a
usability study, as have been outlined in chapters 7, 8, and 13:
-
Establishing goals and questions
-
Selection of participants
-
Development of the tasks
-
The test procedure
-
Data collection
-
Conducting experiments in usability testing (Section 14.2.2).
-
A usability study is sometimes carried out in the form of a scientific
experiment.
-
This involves testing a specific hypothesis.
-
A basic hypothesis is stated in terms of two variables.
-
E.g., "Reading text displayed in 12-point Helvetica font is faster than
Reading text displayed in 12-point Times New Roman."
-
The variables are classified as dependent or independent.
-
The value of an independent variable is selected in advance by the
experimenter, e.g., the font type in the preceding example.
-
The value of a dependent variable is measured in the experiment, e.g.,
the time taken to read the text.
-
An hypothesis can be stated in null and alternative forms.
-
The null hypothesis states the opposite of an experimenter's proposed, a.k.a.,
alternative hypothesis.
-
In the preceding example, the null hypothesis is that there is no difference
between reading times with the two fonts.
-
Statistically, the null hypothesis provides a baseline for measuring the
significance of observed findings.
-
I.e., significance is defined in terms of how often observed data support the
null hypothesis.
-
If this happens rarely, it allows the experimenter to invoke a form of proof-
by-contradiction.
-
I.e., if gathered evidence can be shown rarely to support the null hypothesis,
then the alternative hypothesis is assumed true, within the determined degree
of statistical significance.
-
HCI experiments very often involve multiple variables.
-
There can be more than one dependent variable, and more than one independent
variable.
-
In HCI experiments, there are very typically other unmeasured variables in an
experiment, the values of which must be kept constant.
-
E.g., font color and screen resolution are such variables in the preceding
example.
-
The significant challenges of experimental design are to
-
identify all the variables that may effect the experimental outcome,
-
set up the experimental conditions so that the values of all unmeasured
variables remained fixed.
-
The book provides further details and examples of the statistical design of
some simple HCI experiments.
-
Several of the 484 research readings have larger-scale examples of statistical
experimental design.
-
And there are many web and textbook resources on the subject.
-
Field studies (Section 14.3).
-
This section is largely a recap of material presented in preceding chapters.
-
Here are some important points to remember about field studies, most notablly
for the sultans of swat:
-
Tell participants what they will be asked to do, and about how long it will
take.
-
Have a plan of the questions you want to ask, but be prepared to be flexible as
circumstances dictate.
-
Let the participants "do their own thing" during the specific period of
prototype usage.
-
Observe participants as unobtrusively a possible, but do observe in
some form.
-
Record the sessions with notes, and other forms of recording that the
participants consider acceptable.
-
The larger-scale examples in Section 14.3 are not directly applicable to 484.
-
Neither are the theoretical frameworks used -- activity theory and
semiotic engineering, but worth a read.
-
Introduction to Chapter 15, on Analytic Evaluation (Section
15.1).
-
This topic was the subject 484 Assignment 1, for which you already read Section
15.2.
-
The gist of analytic evaluation is that it does not involve actual end
users, where "actual" means real people who will use a product for its intended
purpose.
-
Rather, analytic evaluation is conducted by assumed product and domain experts,
using inspections or models to evaluate a product.
-
Heuristic Evaluation (Section 15.2) -- The subject of
Assignment 1.
-
Inspection Walkthroughs (Section 15.3).
-
These are typically performed by a team composed developers, usability experts,
and possibly users or user representatives.
-
Per Nielson, "Cognitive walkthroughs involve simulating a user's problem-
solving process at each step in the human-computer dialog, checking to see if
the user's goals and memory for actions can be assumed to lead to the next
correct action."
-
Steps of a cognitive walkthrough:
-
Identify the characteristics of typical users.
-
Describe product or prototype to users.
-
Define a specific task for users to complete.
-
Convene designers and usability experts.
-
Walk through each action of the task, asking
-
Will the correct action be evident?
-
Will the user notice of the correct action is available?
-
Will the user properly understand the response to their action, correct or
incorrect?
-
Record important information:
-
Ideas about what causes user problems, and why.
-
Notes about design changes that may be needed, and other potentially relevant
side issues.
-
A summary of the results.
-
Revise the design to fix the problems.
-
The book makes the important observation that such walkthroughs should be
egoless.
-
I.e., designers should not try to defend their designs when it's clear there
are problems.
-
Usability experts should lose the "we're inherently more knowledgeable"
attitude.
-
The book also discusses an alternate form of pluralistic
walkthroughs, which differ from the preceding cognitive walkthroughs in
the following respects:
-
Usage scenarios are developed as part of the process, not in advance.
-
The analysis is involves more collaborative discussion than directed user
observation.
-
Predictive models (Section 15.4).
-
Inspections are conducted without users, and without experts role-playing
users.
-
Rather, the inspection uses a formulaic model that predicts user behavior.
-
GOMS models (Section 15.4.1)
-
It's an acronym for
-
Goals -- what the user wants to achieve
-
Operators -- user cognition and actions needed to achieve goals
-
Methods -- learned procedures for accomplishing goals
-
Selection rules -- used to choose which of multiple methods to select
-
GOMS is a generic model, that does not predict specific user performance.
-
The keystroke-level model (Section 1.5.2)
-
In contrast to generic GOMS, this model provides actual numeric predictions of
user performance.
-
The model is based on the analysis of many empirical studies of actual user
performance.
-
The table on Page 709 lists some times for core tasks, including
-
pressing a key
-
pointing with a mouse
-
clicking the mouse
-
homing hands on the keyboard
-
drawing a line with a mouse
-
making a decision to do something (can vary significantly)
-
system response time (can vary significantly)
-
These numbers are used to add up the amount of time a particular task takes
using a particular interface.
-
The book provides a couple examples.
-
Benefits (and limitations) of using GOMS (Section 15.4.3).
-
GOMS-style analyses do provides hard, quantitative data.
-
Such analyses can in some cases lead to significant interaction design
improvements.
-
However,
-
GOMS analyses are limited to a relatively small set of routine tasks.
-
Do not model user errors.
-
Nor do they model other cognitive, social, or environmental factors, such as
-
fatigue
-
distractions
-
multi-tasking
-
learning effects
-
Fitts' law (Section 15.4.4).
-
Published in 1954 by Paul Fitts, it predicts the time it takes to reach a
target using a pointing device.
-
Can be used to determine where to place interface widgets on a display, and how
big they should be.
-
As the book's nutshell description says -- the bigger the target, the easier
and quicker it is to reach.
-
Specific HCI results based on Fitts:
-
Don't have lots of tiny buttons crammed together.
-
Things in the four corners of the screen are easy to reach (so why don't more
UIs take advantage of this?)
-
Some good interaction design results have been obtained by applying Fitts.
-
And it's still with us, as evidenced by a full session in the 2008 SIGCHI
entitled "Fitt's Law Lives".
index
|
lectures
|
assignments
|
projects
|
handouts
|
solutions
|
examples
|
documentation
|
bin
|
grades