CSC 101 Lecture Notes Week 5
Arrays and Strings
Relevant Reading: Chapters 8 and 9
and
http://users.csc.calpoly.edu/~gfisher/classes/101/examples/week5
http://users.csc.calpoly.edu/~gfisher/classes/101/examples/week6
Why didn't we compute the standard deviation, and what would it take to do this?
/**** * * This program computes simple statistics for up to 1000 numbers read from * standard input. The program first asks for the number of values that the * statistics will be computed for. The program then reads in that many * values or 1000 values, whichever is smaller. * * The statistics computed are the sum of the numbers, arithmetic mean and the * standard deviation. The results are printed to standard output, in the * following form: * * Sum = * Mean = * Standard Deviation = * * The precise formulae for mean and standard deviation are as defined here: * * http://www.gcseguide.co.uk/statistics_and_probability.htm * * * Author: Gene Fisher (gfisher@calpoly.edu) * Created: 14apr11 * Last Modified: 14apr11 * */ #include <stdio.h> #include <math.h> #define MAX_DATA_POINTS 1000 int main () { int n; /* Number of values to compute stats for */ double x; /* Input value read from the terminal */ double sum; /* Computed sum */ double mean; /* Computed mean */ double sum_sq; /* Computed sum of squares, for std dev */ double std_dev; /* Computed standard deviation */ int i; /* Loop counter variable */ double data[MAX_DATA_POINTS]; /* Array to hold numbers */ /* * Input the number of values, and prompt for the rest of the data values. */ printf("Input the number of values you want to compute stats for: "); scanf("%d", &n); printf("Input the values, separated by whitespace: "); /* * Bounds check the input, and truncate to MAX_DATA_POINTS if necessary. * This is to ensure that we don't store values past the end of the array. */ if (n > MAX_DATA_POINTS) { printf("The program will use the first 1000 numbers only.\n"); n = MAX_DATA_POINTS; } /* * Initialize the sum to 0. */ sum = 0; /* * Initialize the loop counter to 0. */ i = 0; /* * Loop until all the values are read in, accumulating the sum as we go. * Note that the loop will not go at all if the user enters a non-positive * value for the number of data points. */ while (i < n) { /* * Input the next value. */ scanf("%lf", &x); /* * Put the value into the array. */ data[i] = x; /* * Increment the sum. */ sum = sum + x; /* * Increment the loop counter, so we'll stop after n inputs. */ i = i + 1; } /* * Compute the mean. */ mean = sum / n; /* * Compute the standard deviation. This computation is the whole reason * for using the array. That is, the formula we're using for standard * deviation requires both the mean and each of the data points. So the * computational strategy here is as follows: * (1) Read the values from stdin, computing the sum as we go. * (2) Also store each value in an array as we go. * (3) Compute the mean once all the values are read in. * (4) Compute the standard deviation */ for (i = 0, sum_sq = 0; i < n; i++) { sum_sq += pow(data[i] - mean, 2); } std_dev = sqrt(sum_sq / (n - 1)); /* * Output the results. */ printf("Sum = %f\n", sum); printf("Mean = %f\n", mean); printf("Standard Deviation = %f\n", std_dev); return 0; }
/**** * * This program computes simple statistics for up to 1000 real numbers read * from standard input. The numbers are read up to EOF or 1000 input values, * which ever occurs first. The statistics computed are the sum of the * numbers, the arithmetic mean, and the standard deviation. The results are * output to standard output, in the following form: * * Sum = * Mean = * Standard Deviation = * * The precise formulae for mean and standard deviation are as defined here: * * http://www.gcseguide.co.uk/statistics_and_probability.htm * * * Author: Gene Fisher (gfisher@calpoly.edu) * Created: 31mar11 * Last Modified: 3apr11 * */ #include <stdio.h> #include <math.h> #define MAX_DATA_POINTS 1000 /* * Declare the prototypes for functions used in the program. */ int read_values(double data[], int max); double compute_sum(double data[], int n); double compute_mean(double data[], int n); double compute_std_dev(double data[], int n); int main () { /* * Declare an array to hold the numbers, and an int for the number of * values read in. Decclare a double to check if stdin is emmpty. */ double data[MAX_DATA_POINTS]; int n; double datum; /* * Prompt the user for the data values. From the terminal, the user * generates an EOF by typing control-D. If input is redirected from a * file, EOF is produced after the last value is read from the file. */ printf( "Enter up to %d numeric values, terminating input with control-D:\n", MAX_DATA_POINTS); /* * Call the read_values function to read the numbers into the data array, * and return the number of values read. */ n = read_values(data, MAX_DATA_POINTS); /* * Determine if there are any remaining data values on stdin, and tell the * user that they will be ignored. */ if (scanf("%lf", &datum) != EOF) { printf("\n NOTE: The program will use the first 1000 numbers only.\n\n"); } /* * Compute and output the results. */ printf("Sum = %f\n", compute_sum(data, n)); printf("Mean = %f\n", compute_mean(data, n)); printf("Standard Deviation = %f\n\n", compute_std_dev(data, n)); return 0; } /* * Read up to max values from standard input, and put the values in the given * data array. Return the number of values read, up to EOF or the given max, * whichever occurs first. It is the caller's responsibility to ensure that * the given data array has at least max elements. */ int read_values(double data[], int max) { int i; for (i = 0; i < max && scanf("%lf", &data[i]) != EOF; i++) ; return i; } /* * Return the sum of the first n values of the given data array. */ double compute_sum(double data[], int n) { int i; double sum; for (i = 0, sum = 0; i < n; i++) { sum += data[i]; } return sum; } /* * Return the arithmetic mean of the first n values of the given data array. */ double compute_mean(double data[], int n) { return compute_sum(data, n) / n; } /* * Return the standard deviation of the first n values of the given data array. */ double compute_std_dev(double data[], int n) { int i; double mean = compute_mean(data, n); double sum_squares = 0; for (i = 0; i < n; i++) { sum_squares += pow(data[i] - mean, 2); } return sqrt(sum_squares / (n - 1 )); } /*** * * QUESTIONS: * * 1. How many lines would you have to change in the preceding program to * have it compute stats for up to a 1000000 numbers? * * 2. There is an exquisitly subtle flaw in the following version of the for * loop statement in read_values: * * for (i = 0; scanf("%lf", &data[i]) != EOF && i < max; i++) * * What is the flaw? NOTE: This question will be on an upcoming quiz or * exam! * * 3. Statistically, the median is the middle value in set of data points. * Suppose I wanted to add a function compute_median. What would it's * implementation look like? * */