CSC 357 Lecture Notes Week 2
C Program Structure Pointers and Structs Dynamic Memory Management



  1. Relevant reading.
    1. K&R chapters 5 and 6, section 8.7.
    2. Selected parts of Stevens and selected man pages, as cited in the lab and programming assignment writeups.

  2. An initial example -- a simple linked list in C.
    1. See the attached listings for
    2. We'll start with a tour through the listings.
    3. Then we'll cover the concepts and features that appear in the example, with particular emphasis on chapters 5 and 6 of K&R.

  3. Review of overall C program structure.
    1. C programs are defined as collections of .c and .h files.
      1. The .h files are "header" files that contain data and function declarations.
      2. The .c files contain the function definitions, i.e., the implementations.
    2. C programs also include preprocessor directives #include and #define.

  4. Constants and parameterized macros with #define.
    1. As explained in K&R Section 2.3, #define is used to define constant data values, as in
      #define MAXLINE 1000
      
    2. By convention, data constant names are spelled as all uppercase letters.
    3. #define can also be used to define parameterized macros, as exemplified in std-macros.h.
    4. The general form of a macro is:
      #define name optional-parameters body
    5. E.g.,
      #define new(t) (t*) malloc(sizeof(t))
      
    6. Macros are invoked strictly by in-place textual substitution.
      1. In the case of new, for example, the invocation
        ListNode* node = new(ListNode);
        
        expands to
        ListNode* node = (ListNode*) malloc(sizeof(ListNode));
        
      2. This expansion is done by the C preprocessor.
      3. You can inspect the preprocessor output explicitly using the -E switch to gcc.

  5. Memory allocation (K&R Section 5.4).
    1. The '&' operator is of limited practical utility for building dynamically linked data structures.
    2. As illustrated in the linked-list example, programmers need to allocate new blocks of memory for such data structures, not just get pointers to existing variables.
    3. Section 5.4 of K&R talks about the implementation of a simplistic alloc function.
    4. In practice, C programmers use the library-supplied malloc, as well as derivatives calloc and realloc.
    5. The signature of malloc is the following:
      void* malloc(size_t size);
      
      1. The type size_t is an int or long, depending on OS details; either way, the size parameter is the number of bytes to be allocated.
      2. void* is the type of a generic pointer; in practice, the void* return value from malloc is always cast to a more specific type of pointer.
    6. Here are typical examples of malloc:
          /* Allocate memory for a 100-char string. */
          char* some_string = (char*) malloc(100);
      
          /* Allocate memory for an integer array with array_size elements, where
             array_size is an integer variable having been previously set. */
          int* a = (int*) malloc(array_size);
      
          /* Allocate memory for a structured data value. */
          typedef struct {int x; char y; char z[20];} SomeStruct;
          SomeStruct* s = (SomeStruct*) malloc(sizeof(SomeStruct));
      
    7. The last of these examples is so frequently used, that a macro like new can be very handy.
      1. The definition of new is:
        #define new(t) (t*) malloc(sizeof(t))
        
      2. It is used, for example, like this:
        SomeStruct* s = new(SomeStruct);
        
    8. You should read the man page for malloc and related library functions (man malloc on falcon/hornet).

  6. Overview of how malloc works (K&R Section 8.7).
    1. Malloc is a reasonably straightforward C program that manages a linked list of memory blocks.
    2. The following is the figure from Page 185 of K&R:

    3. When a user requests a block of memory from malloc, it searches the free list for a block big enough to satisfy the request.
      1. It could search for the first such block -- the "first fit" strategy.
      2. Alternatively, it could search for the smallest free block that satisfies the request -- the "best fit" strategy.
    4. If there is no free block big enough, malloc asks the OS for another chunk of memory, using the low-level sbrk system function.
    5. When the user frees a block, malloc searches the list to find it, and coalesces it with adjacent free blocks.
    6. The standard implementation of malloc in most versions of UNIX does little error checking, largely for efficiency reasons.
      1. This means that if a programmer is not careful, malloc's memory pool can get corrupted by writing over the portions of the free list that malloc depends on.
      2. There are memory allocation and management packages that do more checking than malloc, for example looking for memory that is not freed properly.
      3. One of them is called "smartalloc", and we'll have a look at it a bit later in the quarter.

  7. More on pointers and arrays (K&R Sections 5.6 - 5.10, 5.12).
    1. You should read and understand these sections of K&R.
    2. You can skip Section 5.10 for now.

  8. Structures (K&R chapter 6).
    1. We've seen structs being used in the linked list example.
    2. A struct is set of variables collected under a common name; the variables are typically referred to as the fields of the struct.
    3. Compared to Java, a struct is the equivalent of a class with all public data fields and no method definitions.

  9. Basics of structures (K&R Section 6.1).
    1. The syntax of a structure declaration is
      struct struct-tag {
      fields
      }
      where struct-tag is a name, and fields are variable declarations; the tag is optional.
    2. Structure fields are also referred to as members; the two terms are synonymous.
    3. Here's a simple example:
      struct point {
          int x;
          int y;
      }
      
    4. A struct declaration defines a type, and so can be used directly to declare struct-type variables.
      1. I.e.,
        struct { ... } x, y, z;
        
        is syntactically analogous to
        int x, y, z;
        
      2. If a struct declaration contains a tag, then it can be used in subsequent variable and parameter declarations, as in
        struct point pt;
        
        (but a cleaner-looking form of struct naming is with typedef, as we'll see shortly).
    5. Structs can be initialized in a declaration, as in
      struct point maxpt = {320, 200};
      
    6. Struct fields are accessed with the '.' operator, as in
      pt.x = 10;
      pt.y = 20;
      printf("%d,%d", pt.x, pt.y);
      
    7. Nested struct definitions are certainly possible, as in
      struct rect {
          struct point pt1;
          struct point pt2;
      };
      

    8. If we declare
      struct rect screen;
      
      then
      screen.pt1.x
      
      refers to the x coordinate of the pt1 field.

  10. Structures and functions (K&R Section 6.2).
    1. Legal operations on structs are assignment, address-of, and member access; assignment includes passing as a parameter value and returning from a function.
    2. For large structs, passing a struct pointer as a parameter is more efficient than the struct itself.
    3. Pointers to structs are also necessary when creating dynamically-linked data structures.
    4. There are two notations for accessing the fields of a pointed-to struct, such as
      struct point *pp;
      
      1. The expression (*pp).x accesses the x field; the parentheses are necessary because the precedence of '.' is higher than '*'.
      2. An alternative, and equivalent access notation is pp->x.

  11. Arrays of structures(K&R Section 6.3).
    1. Arrays of structs are an important working data structure in C.
    2. For example, the following defines a very simple word-count table, akin to (but simpler than) what you'll be defining for programming assignment 2:
      #define MAXWORDS 100
      

      struct { char* word; int count; } wordtab[MAXWORDS];
    3. Assuming the fields of the ith table element have been properly initialized, doing some work with wordtab looks like this:
      wordtab[i].word[j] = getchar();  /* set the jth char of the the ith word */
      wordtab[i].count++;              /* increment the count of the ith word */
      

  12. Pointers to structures (K&R Section 6.4).
    1. When an array of structs is sparse, i.e., not all of its elements are used, an array of pointers to structs can be more efficient.
    2. Consider the following declarations:
      struct wordcnt {
          char* word;
          int count;
      };
      
      struct wordcnt wordtab[MAXWORDS];       /* array of structs */
      struct wordcnt* wordtabp[MAXWORDS];     /* array of pointers to structs */
      
    3. Before any elements of wordtabp have been set to allocated values, wordtabp is half as big as wordtab.
    4. In general, when the contents of a table may be partially unfilled, using struct pointers is advantageous.

  13. Self-referential structures (K&R Section 6.5).
    1. C allows a struct field to be declared as a pointer to the struct itself, in order to support self-referential data structures like linked lists and trees.
    2. E.g.,
      struct tnode {                  /* the tree node */
          char* word;                 /* points to the text of a word */
          int count;                  /* number of occurrences */
          struct tnode* left;         /* left child */
          struct tnode* right;        /* right child */
      };
      
    3. This is an example of a recursive data type definition, which are common in C programs that do useful work.

  14. Table lookup (K&R Section 6.6).
    1. This section of K&R defines a simple hash table.
    2. Have a look.

  15. Typedefs (K&R Section 6.7).
    1. As illustrated in the linked-list example, a typedef provides a convenient way to give a mnemonic name to a data type definition.
    2. The typedef can be as simple as
      typedef int Length;
      
      which can be used in declarations like
      Length len, maxlen;
      Length getLength(...);
      
    3. Typedefs can also add a good deal of readability to struct definitions, as in
      typedef struct {
          char* word;
          int count;
      } WordCount;
      
      WordCount wordtab[MAXWORDS];
      WordCount* wordtabp[MAXWORDS];
      
    4. Note that when a non-recursive struct is typedef'd, the struct tag need not be present; the struct can be referred to always by its typedef name.
    5. But note also that for recursive types, the tag must be present in order for the self-referencing declarations to be correct, as in
      typedef struct tnode {          /* the tree node */
          char* word;                 /* points to the text of a word */
          int cound;                  /* number of occurrences */
          struct tnode* left;         /* left child */
          struct tnode* right;        /* right child */
      } TreeNode;
      
      TreeNode* tree;
      
    6. The following equivalent-looking definition does NOT work, since it violates the declare-before-use rule:
      typedef struct tnode {          /* the tree node */
          char* word;                 /* points to the text of a word */
          int cound;                  /* number of occurrences */
          TreeNode* left;             /* left child, with INVALID FORWARD REF */
          TreeNode* right;            /* right child, with INVALID FORWARD REF */
      } TreeNode;
      

  16. Unions (K&R Section 6.8).
    1. A union variable may hold values of different types.
    2. Suppose we want to have a variable that can hold one of an int, double, string, or boolean; a union declaration for this type is
      typedef union {
          int int_val;
          double double_val;
          char* string_val;
          unsigned char bool_val;
      } GenericValue;
      
    3. Syntactically, unions are declared and accessed in precisely the same way as structs.
      1. Union fields are accessed with '.'.
      2. Pointer-to-union fields are accessed with '->'.
    4. The semantic difference between a struct and a union is that a struct value contains all of its data fields, whereas a union value contains one of its data fields at any given time.
    5. As explained on pages 147-148 of K&R, "It is the programmer's responsibility to keep track of which type is currently stored in a union; the results are implementation-dependent if something is stored as one type and extracted as another."
    6. For this reason, union types are often tagged to keep track of the current value.
      1. Union tags are frequently implemented with enums, with one enum literal for each of the union fields.
      2. E.g.,
        typedef enum {INT, DOUBLE, STRING, BOOL} ValueTag;
        
        typedef struct {
            ValueTag tag;
            GenericValue val;
        } TaggedGenericValue;
        
      3. Here's some example usage of this data type:
        void PrintTaggedGenericValue(TaggedGenericValue v) {
            switch (v.tag) {
                case INT:
                    printf("%d0, v.val.int_val);
                    break;
                case DOUBLE:
                    printf("%f0, v.val.double_val);
                    break;
                case STRING:
                    printf("%s0, v.val.string_val);
                    break;
                case BOOL:
                    printf("%s0, v.val.bool_val ? "true" : "false");
            }
        }
        
        main() {
            TaggedGenericValue tval;
        
            tval.val.int_val = 10;
            tval.tag = INT;
            PrintTaggedGenericValue(tval);
        
            tval.val.bool_val = 0;
            tval.tag = BOOL;
            PrintTaggedGenericValue(tval);
        
        }
        
      4. The idea is that the union type appears in the context of a struct that has information indicating which union value is current.

  17. Bit-fields (K&R Section 6.9).
    1. Bit-fields provide access to individual binary bits in a word of memory.
    2. Such access can be used to save space, e.g., by defining a boolean value as a single bit instead of an unsigned char or int.
    3. Bit-fields also provide the programmer direct access to the bit-level interfaces of hardware devices.
    4. We'll talk more about bit-fields later in the quarter.

  18. A culminating example.
    1. To illustrate further some of the key concepts covered in these notes, there are some additional listings attached to the notes for a person- record example.
        • person-record.h -- three versions of a PersonRecord data structure, to illustrate different ways that C memory can be laid out
        • person-record.c -- some printing functions to display what different forms of PersonRecord look like
        • person-record-test.c -- a testing program to illustrate the use of the different versions of person record defined in person-record.h
    2. Of note, the commenting style in this example is doxygen-compliant:
      1. doxygen is a javadoc-like tool for C and C++ programs, which has a few additional comment style rules for use with C
        1. The comment starter
          /*! \file
          
          tells doxygen to treat a .h file similar to how javadoc treats a .java class file; this means that doxygen treats C typedefs similar to Java classes.
        2. The comment starter
          /**<
          
          allows struct field comments to appear after the field decls instead of before; this is in keeping with an often-used commenting style in C.
      2. There is a directory of doxygen-generated documentation here:
        
        www.csc.calpoly.edu/~gfisher/classes/357/examples/person-record/html/
        
        
      3. Further work with doxygen will be the subject of an upcoming lab.
    3. ALSO PLEASE NOTE: There are some unanswered questions in the testing file person-record-test.c; these questions will be topics in next week's lab.



index | lectures | labs | programs | handouts | solutions | examples | documentation | bin