CSC 101 Lecture Notes Week 5
Arrays and Strings
Relevant Reading: Chapters 8 and 9
and
http://users.csc.calpoly.edu/~gfisher/classes/101/examples/week5
http://users.csc.calpoly.edu/~gfisher/classes/101/examples/week6
Why didn't we compute the standard deviation, and what would it take to do this?
/****
*
* This program computes simple statistics for up to 1000 numbers read from
* standard input. The program first asks for the number of values that the
* statistics will be computed for. The program then reads in that many
* values or 1000 values, whichever is smaller.
*
* The statistics computed are the sum of the numbers, arithmetic mean and the
* standard deviation. The results are printed to standard output, in the
* following form:
*
* Sum =
* Mean =
* Standard Deviation =
*
* The precise formulae for mean and standard deviation are as defined here:
*
* http://www.gcseguide.co.uk/statistics_and_probability.htm
*
*
* Author: Gene Fisher (gfisher@calpoly.edu)
* Created: 14apr11
* Last Modified: 14apr11
*
*/
#include <stdio.h>
#include <math.h>
#define MAX_DATA_POINTS 1000
int main () {
int n; /* Number of values to compute stats for */
double x; /* Input value read from the terminal */
double sum; /* Computed sum */
double mean; /* Computed mean */
double sum_sq; /* Computed sum of squares, for std dev */
double std_dev; /* Computed standard deviation */
int i; /* Loop counter variable */
double data[MAX_DATA_POINTS]; /* Array to hold numbers */
/*
* Input the number of values, and prompt for the rest of the data values.
*/
printf("Input the number of values you want to compute stats for: ");
scanf("%d", &n);
printf("Input the values, separated by whitespace: ");
/*
* Bounds check the input, and truncate to MAX_DATA_POINTS if necessary.
* This is to ensure that we don't store values past the end of the array.
*/
if (n > MAX_DATA_POINTS) {
printf("The program will use the first 1000 numbers only.\n");
n = MAX_DATA_POINTS;
}
/*
* Initialize the sum to 0.
*/
sum = 0;
/*
* Initialize the loop counter to 0.
*/
i = 0;
/*
* Loop until all the values are read in, accumulating the sum as we go.
* Note that the loop will not go at all if the user enters a non-positive
* value for the number of data points.
*/
while (i < n) {
/*
* Input the next value.
*/
scanf("%lf", &x);
/*
* Put the value into the array.
*/
data[i] = x;
/*
* Increment the sum.
*/
sum = sum + x;
/*
* Increment the loop counter, so we'll stop after n inputs.
*/
i = i + 1;
}
/*
* Compute the mean.
*/
mean = sum / n;
/*
* Compute the standard deviation. This computation is the whole reason
* for using the array. That is, the formula we're using for standard
* deviation requires both the mean and each of the data points. So the
* computational strategy here is as follows:
* (1) Read the values from stdin, computing the sum as we go.
* (2) Also store each value in an array as we go.
* (3) Compute the mean once all the values are read in.
* (4) Compute the standard deviation
*/
for (i = 0, sum_sq = 0; i < n; i++) {
sum_sq += pow(data[i] - mean, 2);
}
std_dev = sqrt(sum_sq / (n - 1));
/*
* Output the results.
*/
printf("Sum = %f\n", sum);
printf("Mean = %f\n", mean);
printf("Standard Deviation = %f\n", std_dev);
return 0;
}
/****
*
* This program computes simple statistics for up to 1000 real numbers read
* from standard input. The numbers are read up to EOF or 1000 input values,
* which ever occurs first. The statistics computed are the sum of the
* numbers, the arithmetic mean, and the standard deviation. The results are
* output to standard output, in the following form:
*
* Sum =
* Mean =
* Standard Deviation =
*
* The precise formulae for mean and standard deviation are as defined here:
*
* http://www.gcseguide.co.uk/statistics_and_probability.htm
*
*
* Author: Gene Fisher (gfisher@calpoly.edu)
* Created: 31mar11
* Last Modified: 3apr11
*
*/
#include <stdio.h>
#include <math.h>
#define MAX_DATA_POINTS 1000
/*
* Declare the prototypes for functions used in the program.
*/
int read_values(double data[], int max);
double compute_sum(double data[], int n);
double compute_mean(double data[], int n);
double compute_std_dev(double data[], int n);
int main () {
/*
* Declare an array to hold the numbers, and an int for the number of
* values read in. Decclare a double to check if stdin is emmpty.
*/
double data[MAX_DATA_POINTS];
int n;
double datum;
/*
* Prompt the user for the data values. From the terminal, the user
* generates an EOF by typing control-D. If input is redirected from a
* file, EOF is produced after the last value is read from the file.
*/
printf(
"Enter up to %d numeric values, terminating input with control-D:\n",
MAX_DATA_POINTS);
/*
* Call the read_values function to read the numbers into the data array,
* and return the number of values read.
*/
n = read_values(data, MAX_DATA_POINTS);
/*
* Determine if there are any remaining data values on stdin, and tell the
* user that they will be ignored.
*/
if (scanf("%lf", &datum) != EOF) {
printf("\n NOTE: The program will use the first 1000 numbers only.\n\n");
}
/*
* Compute and output the results.
*/
printf("Sum = %f\n", compute_sum(data, n));
printf("Mean = %f\n", compute_mean(data, n));
printf("Standard Deviation = %f\n\n", compute_std_dev(data, n));
return 0;
}
/*
* Read up to max values from standard input, and put the values in the given
* data array. Return the number of values read, up to EOF or the given max,
* whichever occurs first. It is the caller's responsibility to ensure that
* the given data array has at least max elements.
*/
int read_values(double data[], int max) {
int i;
for (i = 0; i < max && scanf("%lf", &data[i]) != EOF; i++)
;
return i;
}
/*
* Return the sum of the first n values of the given data array.
*/
double compute_sum(double data[], int n) {
int i;
double sum;
for (i = 0, sum = 0; i < n; i++) {
sum += data[i];
}
return sum;
}
/*
* Return the arithmetic mean of the first n values of the given data array.
*/
double compute_mean(double data[], int n) {
return compute_sum(data, n) / n;
}
/*
* Return the standard deviation of the first n values of the given data array.
*/
double compute_std_dev(double data[], int n) {
int i;
double mean = compute_mean(data, n);
double sum_squares = 0;
for (i = 0; i < n; i++) {
sum_squares += pow(data[i] - mean, 2);
}
return sqrt(sum_squares / (n - 1 ));
}
/***
*
* QUESTIONS:
*
* 1. How many lines would you have to change in the preceding program to
* have it compute stats for up to a 1000000 numbers?
*
* 2. There is an exquisitly subtle flaw in the following version of the for
* loop statement in read_values:
*
* for (i = 0; scanf("%lf", &data[i]) != EOF && i < max; i++)
*
* What is the flaw? NOTE: This question will be on an upcoming quiz or
* exam!
*
* 3. Statistically, the median is the middle value in set of data points.
* Suppose I wanted to add a function compute_median. What would it's
* implementation look like?
*
*/