CSC 484 Lecture Notes Week 7, Part 2
Data Gathering (Part 1)
Data Analysis (Part 2)



  1. Introduction to Chapter 8 (Section 8.1).
    1. Data analysis can be quantitative, qualitative, or both.
    2. This chapter of the book presents ways to do these types of analyses, on data gathered with the techniques described in Chapter 7.
    3. It also discusses the interpretation of analysis results.
      1. Simple interpretation involves the identification of apparent patterns or trends in the data.
      2. Deeper interpretation entails drawing conclusions from statistical analysis.
    4. Interpretation must be done carefully, and supported fully by the data.
      1. Suppose, for example, that you produce a statistical finding that one group of study subjects performs some task more slowly than another group of subjects, to a statistically significant degree.
      2. This finding could be interpreted in a number of ways, including
        1. skill differences between the groups
        2. differences in how the groups were trained
        3. differences in how the study was administered to the two groups.
      3. Eliminating the effects of such factors is part of the art and science of designing a good study.
    5. Another issue of interpretation is over-claiming what results represent.
      1. In general, you should be maximally conservative in what you conclude, only making claims that can be unambiguously backed up by the data analysis.
      2. Don't use words like "all" or "most" or "many" unless they are well-founded.
      3. Always use hard numbers to back up what these words mean in a particular case, as in "Most participants (73%) answered yes to Question 6."

  2. Definitions of "Quantitative" and "Qualitative" (Section 8.2).
    1. Quantitative data are 1 numeric values, or values that can be readily converted to numeric form.
    2. In contrast, qualitative data are difficult to represent numerically, in a meaningful way.
      1. The notion of "meaningful way" is important in this context, since numeric data representations can easily be used to misrepresent results.
      2. See, for example, "How to Lie with Statistics", by Darrel Huff 2.

  3. First steps in data analysis (Section 8.2.1).
    1. Most of these steps are common sense, but they're worth mentioning explicitly.
    2. If you have interview notes, transcribe them as soon as possible after the interviews, while the interview is fresh in your mind.
    3. Questionnaire data may need to be "groomed" from the raw form entered by respondents.
      1. For example, you can remove or otherwise identify unanswered questions.
      2. Electronic questionnaire tools can assist with such grooming, or perform it automatically.
      3. In the case of electronic tools, you need to read the documentation to understand in what form the groomed data are delivered.
    4. As with interview notes, other forms of gathered data should be at least initially analyzed, very soon after the data are gathered.
      1. Organize photos and give them a dated caption.
      2. File things (electronically) in appropriate places.
    5. Table 8.1 (book page 359) summarizes typical initial processing steps for the main data gathering techniques.

  4. Simple quantitative analysis (Section 8.3).
    1. An example of an open-ended, hard-to-analyze question:
      What do you think of feature X?

      with typical responses
      • "It's stupid."
      • "I liked it a lot."
      • "It's hard to use, because ... "
    2. Examples of closed, easy-to-analyze questions:
      Feature X is useful.

      |__|_ Strongly Disagree ... |__|_ Strongly Agree


      Feature X is easy to use.
      |__|_ Strongly Disagree ... |__|_ Strongly Agree
    3. The book presents basic analysis examples on pages 362 - 373.
      1. The small-scale analyses are relevant to your 484 studies.
      2. The large-scale analysis in Box 8.3 is far less relevant to 484, given that it involves 100 MB of data, gathered over 26 days, 21 hours a day.

  5. Simple Qualitative Analysis (Section 8.4)
    1. The book provides some guidelines in this area, but much of the discussion is more relevant to data analysis during the requirements gathering process, as opposed to the evaluation process.
    2. Nevertheless, there is some generally useful information presented.

  6. Identifying recurring patterns or themes (Section 8.4.1).
    1. This is generally the staring point of data analysis.
      1. Sometimes patterns or themes form the primary basis of the analysis, as is likely the case in your 484 usability studies.
      2. Other times patterns are the beginning of a more complicated analysis.
      3. Patterns are often apparent in graphical views of data.
    2. It's fine, and sometimes enlightening to find unexpected patterns and themes emerge from the data.
      1. On pages 374 - 378, the book discusses emerging themes in the analysis of ethnographic data.
      2. In general, this domain is not directly relevant to the studies of 484, but the book's observations about emergent themes are instructive.

  7. Categorizing data (Section 8.4.2).
    1. The necessity for post-gathering data categorization depends fundamentally on the open-endedness of the study and its data gathering techniques.
    2. When data are gathered in a very open-ended format, such as the "think-aloud" techniques discussed in the book, then significant post-gathering categorization is necessary.
      1. The process of this categorization is essentially the same as that done by software engineers when they categorize functional requirements, based on data gathered from end users.
      2. I.e., transcripts of user interviews are analyzed to determine emergent categories of functionality.
      3. This process is called "domain analysis" by software engineers, as well as by AI ontologists.
      4. Notice the process description on the top of page 383:
        "In this approach, nouns and verbs are identified and scrutinized to see if they
        represent significant classes."
        which is precisely one of the process steps of domain analysis.
      5. For data analysis purposes, the same form of functional analysis can be used to determine categories relevant to the user experience of study participants.
    3. When data gathering is done in a more closed form, i.e., less open-ended, then there is less need for post-gathering categorization.
      1. This is because at least some degree of pre-gathering categorization is required in order to develop the closed-form study.
      2. I.e., data categorization is inherent in determining meaningful answers to closed questions.
    4. In 484, there looks to be a good deal of pre-gathering categorization that has already taken place, meaning less will need to be done post-gathering.
      1. This is due to the fact that functional categorization is part of the prototyping process, and the prototypes are driving the 484 usability studies to a large extent.
      2. This said, there may well be data categorizations that emerge as a result of conducting the study, for example in the mobility project, where categorizing discovered responses may be part of the analysis.
      3. Also, even with predetermined functional categorization, conducting the study may lead to the discovery of new forms of categorization, that can better suit users' needs.
      4. Such refining of functional categorization can in fact be one of the major benefits of conducting a usability study.

  8. Looking for critical incidents (Section 8.4.3).
    1. This analysis technique is based on identifying particularly significant events in users' interactions.
      1. For example, when users get stuck performing some particular task, or when a prototype misdirects a user in some way.
      2. A critical incident may also be as positive event, such as when a user has an "ah hah" moment in figuring out how an interactive system works.
    2. Studying such incidents alone is probably not sufficient for a full analysis, but it can help focus attention on significant problems or features that should receive attention.


  9. Tools to support data analysis (Section 8.5)
    1. Surveys/questionnaire tools include
      1. phpESP, at
        
        sourceforge.net/projects/phpesp
        
        
        -- open source
      2. SurveyMonkey, at
        
        surveymonkey.com
        
        
        -- free for surveys <= 100 respondents, <= 10 questions
      3. InstantSurvey, at
        
        instantsurvey.com
        
        
        -- free 30-day trial for basic surveys
    2. There are many statistical analysis tools;
      1. The website
        
        freestatistics.altervista.org/en/stat.php
        
        
        lists a bunch of them.
      2. There is also a cite describing how to use Microsoft Excel for ANOVA (insofar as this may be necessary in 484 analyses):
        
        www.psych.northwestern.edu/Misc/anova-excel.html
        
        
        but I could not find the described features in either Mac Office X, or Windows Office 2007.
    3. There is a high-level tool for experimental design described in a 2007 SIGCHI paper
      
      "Touchstone: Exploratory Design of Experiments"
      
      
      by Wendy Mackay et al.
      1. I played around with it a bit, and looks kind of interesting.
      2. I'm not sure how stable it is, and hence how usable it is for 484 purposes.

  10. Using Theoretical Frameworks (Section 8.6).
    1. These are not the general sorts of socio-cognitive frameworks discussed in earlier chapters.
    2. Rather, the frameworks discussed in this section of the book are domain- specific
    3. I.e., the frameworks are developed from the analysis of empirical data, gathered in a specific domain.
    4. As such, the developmental goals for these frameworks are very much the same as the goals for the domain models and ontologies developed by software engineers and computer scientists.
      1. The models are developed by careful analysis of domain artifacts and activities, and the relationships among the artifacts and activities.
      2. The purpose of the models is to help analysts better understand the domain, and hence do a better job of whatever task is at hand -- from analyzing existing artifacts to developing new ones.
    5. With all due respect to the theoretical psychologists, they should do some reading in the last 30 years of computer science research.


  11. Presenting the findings (Section 8.7).
    1. Preceding sections of this chapter, as well preceding chapters, have presented a variety of forms to present data and data analyses.
    2. The last section of Chapter 8 outlines three additional ways to present the findings of a data analysis:
      1. Rigorous notations (Section 8.7.1) -- UML and other modeling notations.
      2. User stories (Section 8.7.2) -- a childish way of presenting scenarios and use cases.
      3. Summaries (Section 8.7.3) -- a necessary part of any analysis activity.
    3. The first two of these largely pertain to the requirements analysis step of interaction design.
    4. The last (Summaries) pertains particularly to data analysis, and should definitely be part of your 484 final project writeup.

  12. The data analysis and presentation in your 484 projects.
    1. Some of the techniques discussed in the book are not directly applicable to the kinds of user studies you're doing in 484.
    2. As noted on several occasions, you should use the gathering and analysis techniques that work for your project.
      1. Except for the 2d3d project, the 484 usability studies are very small scale efforts.
      2. As such, big-gun statistical analysis techniques (e.g., ANOVA) are most likely not appropriate.
    3. Data analysis and presentation techniques that are appropriate to 484 projects are:
      1. various forms of tables and graphs, to present gathered data and its analysis;
      2. at least some basic statistical analysis;
      3. a clearly written summary of the findings of your usability study.
    4. To provide you some rough idea of the scope of your project work, I've posted examples of the final team deliverables for the 484 class taught by Franz Kurfess in Winter 2007.
      1. The examples are located at
        
        www.csc.calpoly.edu/~gfisher/classes/484/examples/w07-final-projects
        
        
      2. As with the storyboard examples from W07, these examples are presented "as is", with no evaluations.
      3. Also as for the storyboard examples, the details of this year's project deliverables are different than W07, so these examples do not represent precisely what is deliverable by you this quarter.
      4. If you have any specific questions about what I expect from your team for this year's final project, please come by office hours any time.

Footnotes:

1 The entry for "data" in Merriam-Webster's 11th Collegiate Dictionary says this: "Data leads a life of its own quite independent of datum, of which it was originally the plural. ... The plural construction is more common in print, evidently because the house style of several publishers mandates it, and because those who use it in singular form sound like ignorami." (Ok, so I added the last bit.)

2 This book by Huff has been acknowledged as "the most widely read statistics book in the history of the world", in no less a publication than the journal of Statistical Science (Vol. 20, No. 3, pp. 205-209, 2005), by J. M. Steele.