CSC 101 Lecture Notes Week 5

CSC 101 Lecture Notes Week 5
Arrays and Strings

Relevant Reading: Chapters 8 and 9

Introduction to Arrays
1. Chapter 8 of the book provides good coverage of array topics.
2. It explains the structure of arrays and has some good pictures of what arrays look like in memory.
3. These notes, together with the Week 6 notes , cover some additional array examples.
4. The code for all of the examples is available online, in the examples pages of the CSC 101 website --
  
  http://users.csc.calpoly.edu/~gfisher/classes/101/examples/week5
  
  and
  
  http://users.csc.calpoly.edu/~gfisher/classes/101/examples/week6

Using an Array in the Stats Program

In last week's lecture notes, we saw how a loop in the stats program could significantly improve things.
Specifically, using a loop in the stats program allows an indefinite number of data points to be read, instead of only a fixed number of data points.
However, the way the loop program was structured in last week's example, it could compute the sum and mean of the data, but not the standard deviation.
We were left with the following question, which appeared as a comment at the end of the example program:
Why didn't we compute the standard deviation, and what would it take to do this?

This question is answered in the following updated program.

The significant updates for adding arrays are shown in red.
look in particular at the comment above the for loop that computes the std_dev.

/****
 *
 * This program computes simple statistics for up to 1000 numbers read from
 * standard input.  The program first asks for the number of values that the
 * statistics will be computed for.  The program then reads in that many
 * values or 1000 values, whichever is smaller.
 *
 * The statistics computed are the sum of the numbers, arithmetic mean and the
 * standard deviation.  The results are printed to standard output, in the
 * following form:
 *
 * Sum =
 * Mean =
 * Standard Deviation =
 *
 * The precise formulae for mean and standard deviation are as defined here:
 *
 *    http://www.gcseguide.co.uk/statistics_and_probability.htm
 *
 *
 * Author: Gene Fisher (gfisher@calpoly.edu)
 * Created: 14apr11
 * Last Modified: 14apr11
 *
 */

#include <stdio.h>
#include <math.h>

#define MAX_DATA_POINTS 1000

int main () {

    int n;                        /* Number of values to compute stats for */
    double x;                     /* Input value read from the terminal */
    double sum;                   /* Computed sum */
    double mean;                  /* Computed mean */
    double sum_sq;                /* Computed sum of squares, for std dev */
    double std_dev;               /* Computed standard deviation */
    int i;                        /* Loop counter variable */
    double data[MAX_DATA_POINTS]; /* Array to hold numbers */

    /*
     * Input the number of values, and prompt for the rest of the data values.
     */
    printf("Input the number of values you want to compute stats for: ");
    scanf("%d", &n);
    printf("Input the values, separated by whitespace: ");

    /*
     * Bounds check the input, and truncate to MAX_DATA_POINTS if necessary.
     * This is to ensure that we don't store values past the end of the array.
     */
    if (n > MAX_DATA_POINTS) {
        printf("The program will use the first 1000 numbers only.\n");
        n = MAX_DATA_POINTS;
    }

    /*
     * Initialize the sum to 0.
     */
    sum = 0;

    /*
     * Initialize the loop counter to 0.
     */
    i = 0;

    /*
     * Loop until all the values are read in, accumulating the sum as we go.
     * Note that the loop will not go at all if the user enters a non-positive
     * value for the number of data points.
     */
    while (i < n) {

        /*
         * Input the next value.
         */
        scanf("%lf", &x);

        /*
         * Put the value into the array.
         */
        data[i] = x;

        /*
         * Increment the sum.
         */
        sum = sum + x;

         /*
          * Increment the loop counter, so we'll stop after n inputs.
          */
        i = i + 1;

    }

    /*
     * Compute the mean.
     */
    mean = sum / n;

    /*
     * Compute the standard deviation.  This computation is the whole reason
     * for using the array.  That is, the formula we're using for standard
     * deviation requires both the mean and each of the data points.  So the
     * computational strategy here is as follows:
     *     (1) Read the values from stdin, computing the sum as we go.
     *     (2) Also store each value in an array as we go.
     *     (3) Compute the mean once all the values are read in.
     *     (4) Compute the standard deviation
     */
    for (i = 0, sum_sq = 0; i < n; i++) {
        sum_sq += pow(data[i] - mean, 2);
    }
    std_dev = sqrt(sum_sq / (n - 1));


    /*
     * Output the results.
     */
    printf("Sum = %f\n", sum);
    printf("Mean = %f\n", mean);
    printf("Standard Deviation = %f\n", std_dev);

    return 0;

}

The Best Version Yet of the Stats Program

A further refinement of the stats program is in the example named stats-loops-arrays-functions.c
As the name suggests, it uses a loop to read in data values, an array to hold the values for computation purposes, and functions to perform the computations.
Look in particular at the function named read_values.
1. It reads values from from stdin, up to EOF or the max size of the input array, which ever comes first.
2. It returns the array value in the formal output parameter named data.
3. Section 8.5 of the book, on pages 397 - 404, has a good discussion of array parameters, as both inputs and outputs to functions.

Here's the code for stats-loops-arrays-functions.c, which we'll discuss in some detail during lecture:

/****
 *
 * This program computes simple statistics for up to 1000 real numbers read
 * from standard input.  The numbers are read up to EOF or 1000 input values,
 * which ever occurs first.  The statistics computed are the sum of the
 * numbers, the arithmetic mean, and the standard deviation.  The results are
 * output to standard output, in the following form:
 *
 * Sum =
 * Mean =
 * Standard Deviation =
 *
 * The precise formulae for mean and standard deviation are as defined here:
 *
 *    http://www.gcseguide.co.uk/statistics_and_probability.htm
 *
 *
 * Author: Gene Fisher (gfisher@calpoly.edu)
 * Created: 31mar11
 * Last Modified: 3apr11
 *
 */

#include <stdio.h>
#include <math.h>

#define MAX_DATA_POINTS 1000

/*
 * Declare the prototypes for functions used in the program.
 */
int read_values(double data[], int max);
double compute_sum(double data[], int n);
double compute_mean(double data[], int n);
double compute_std_dev(double data[], int n);

int main () {

    /*
     * Declare an array to hold the numbers, and an int for the number of
     * values read in.  Decclare a double to check if stdin is emmpty.
     */
    double data[MAX_DATA_POINTS];
    int n;
    double datum;

    /*
     * Prompt the user for the data values.  From the terminal, the user
     * generates an EOF by typing control-D.  If input is redirected from a
     * file, EOF is produced after the last value is read from the file.
     */
    printf(
        "Enter up to %d numeric values, terminating input with control-D:\n",
            MAX_DATA_POINTS);

    /*
     * Call the read_values function to read the numbers into the data array,
     * and return the number of values read.
     */
    n = read_values(data, MAX_DATA_POINTS);

    /*
     * Determine if there are any remaining data values on stdin, and tell the
     * user that they will be ignored.
     */
    if (scanf("%lf", &datum) != EOF) {
        printf("\n  NOTE: The program will use the first 1000 numbers only.\n\n");
    }

    /*
     * Compute and output the results.
     */
    printf("Sum = %f\n", compute_sum(data, n));
    printf("Mean = %f\n", compute_mean(data, n));
    printf("Standard Deviation = %f\n\n", compute_std_dev(data, n));

    return 0;

}

/*
 * Read up to max values from standard input, and put the values in the given
 * data array.  Return the number of values read, up to EOF or the given max,
 * whichever occurs first.  It is the caller's responsibility to ensure that
 * the given data array has at least max elements.
 */
int read_values(double data[], int max) {
    int i;
    for (i = 0; i < max && scanf("%lf", &data[i]) != EOF; i++)
        ;
    return i;
}

/*
 * Return the sum of the first n values of the given data array.
 */
double compute_sum(double data[], int n) {
    int i;
    double sum;
    for (i = 0, sum = 0; i < n; i++) {
        sum += data[i];
    }
    return sum;
}

/*
 * Return the arithmetic mean of the first n values of the given data array.
 */
double compute_mean(double data[], int n) {
    return compute_sum(data, n) / n;
}

/*
 * Return the standard deviation of the first n values of the given data array.
 */
double compute_std_dev(double data[], int n) {
    int i;
    double mean = compute_mean(data, n);
    double sum_squares = 0;

    for (i = 0; i < n; i++) {
        sum_squares += pow(data[i] - mean, 2);
    }

    return sqrt(sum_squares / (n - 1 ));
}


/***
 *
 * QUESTIONS:
 *
 *   1. How many lines would you have to change in the preceding program to
 *      have it compute stats for up to a 1000000 numbers?
 *
 *   2. There is an exquisitly subtle flaw in the following version of the for
 *      loop statement in read_values:
 *
 *          for (i = 0; scanf("%lf", &data[i]) != EOF && i < max; i++)
 *
 *      What is the flaw?  NOTE: This question will be on an upcoming quiz or
 *      exam!
 *
 *   3. Statistically, the median is the middle value in set of data points.
 *      Suppose I wanted to add a function compute_median.  What would it's
 *      implementation look like?
 *
 */

Strings as Arrays of Characters
1. The arrays in the stats example held numbers.
2. Arrays can hold any type of data value.
3. A very common type is arrays of char, which are called strings.
4. Chapter 9 of the book does a good job of explaining concepts of strings, including the library functions that operate on strings.
5. The 101/examples/strings directory has some additional examples that we'll discuss during lecture:
  1. string-basics.c -- basics of strings, in particular how they're stored as arrays of char
  2. input-3-strings.c -- very simple program to input three strings from stdin
  3. input-loop.c -- input strings from stdin, until EOF
  4. strlen.c -- implementation of the C library strlen function
  5. string-list.c -- lists of strings, i.e., arrays of strings, i.e., arrays of arrays