CSC 357 Lecture Notes Weeks 6 and 7

CSC 357 Lecture Notes Weeks 6 and 7
Introduction to UNIX Processes and Process Control

Relevant reading for these notes.
1. Stevens chapters 7 and 8.
2. Cited man pages and system .h files, in particular
  - fork(2)
  - exec(3)
  - wait(2)
Programs and processes.
1. A program is an executable file residing on disk in a directory.
2. A process is an independently executing program.
3. At any given time, there are typically many processes running.
Processes viewed from the user (shell) level.
1. The shell command ps provides a wide variety of options for view process status.
2. Shell process backgrounding is performed using the '&' operator at the very end of the command, e.g.,
```
xbiff &
```
  starts a little utility that runs in the background to check periodically for arriving mail.
3. Shell-level job control is available with the jobs shell command and the '%' job selector; e.g., the following is a typical sequence:
```
hornet> xbiff &
[1] 14583
hornet> rmiregistry &
[2] 14590
hornet> jobs
[1]    Running                       xbiff
[2]    Running                       rmiregistry
hornet> %2
rmiregistry
^C
hornet> kill %1
hornet>
```
  1. The command "xbiff &" runs the xbiff utility in the background.
  2. "rmiregistry &" starts another background process, this time with a useful java utility.
  3. The jobs command then lists the currently running background processes.
  4. The %2 command brings the second backgrounded job into the foreground, and typing control-C kills it.
  5. Finally, "kill %1" terminates the backgrounded xbiff directly.
4. A process can be killed with its shell job number, or by its process ID that is listed by ps.
The main function (Section 7.2).
1. The kernel exec's a C program and calls main.
2. The kernel passes in argc and argv.
3. The program itself never calls main -- it is strictly the interface from the OS.
Process termination (Section 7.3).
1. There are five normal ways that a process can be terminated
  1. Return from main.
  2. Calling exit.
  3. Calling _exit or _Exit.
  4. Return from last thread (Sect 11.5).
  5. Calling pthread_exit (Sect 11.5).
2. Three abnormal ways:
  1. Calling abort (Sect 10.17).
  2. Receipt of signal (Sect 10.2).
  3. Cancellation of last thread (Sect 11.5, 12.7).
3. System functions are exit, _exit, and _Exit.
  1. The latter two do not call atexit functions (_exit and _Exit are functionally equivalent).
  2. The atexit function registers exit handler functions, which are called by exit before process termination.
4. The picture on page 183 of Stevens is a good illustration of process termination.
Command-Line arguments (Section 7.4).
1. You've been using these in the assignments.
2. They're a C Standard and POSIX thing.
The process environment list (Section 7.5).
1. The kernel passes each program an environment list, of values set in the calling shell environment.
2. The values are accessible in the global environ variable
3. The system function getenv accesses this list:
```
char *getenv(const char *name);
```
4. By convention, the environment is a set of name,value pairs, both of which are strings.
5. At the shell level, environment variables are set with setenv, and accessed with printenv.
Memory layout of a C program (Section 7.6).
1. The pieces of program memory are the following:
  1. (initialized) text segment
  2. uninitialized data segment
  3. stack
  4. heap
2. Figure 7.6 on Page 188 is a sample illustration, but the exact layout differs among OS implementations.
Shared libraries (Section 7.7).
1. Shared libraries contain pre-compiled program code that is linked into a program at compile time.
  1. Many standard libraries are automatically linked by gcc.
  2. In some cases, such as with the math library, the library needs to be explicitly loaded.
  3. This is done with the "-l" compiler (actually loader) flag:
```
gcc -lm
```
2. There is in fact a separate linking and loading program named "ld".
  1. This command is not often called directly, since gcc calls it as part of its processing.
  2. All options that would be sent to ld can be sent to gcc, meaning gcc effectively subsumes ld.
3. Shared libraries are stored in UNIX .a and .so files.
4. These are comparable to dlls (dynamically linked libraries) in Windows.
Memory allocation (Section 7.8).
1. Three ISO C functions -- malloc, calloc, realloc.
2. You've been using these regularly.
3. Typically implemented with lower-level sbrk function.
4. K&R Section 8.7 has a simple implementation of malloc.
5. See the picture on Page 185.
6. There a number of alternatives to the standard malloc, with these features:
  1. More efficient in terms of time or space.
  2. Instrumented to prevent certain forms of bad calls, e.g., free'ing a block that was not mallo'd.
  3. Instrumented to provide usage statistics and debugging information.
Environment variables (Section 7.9).
1. The meanings are interpreted by applications programs, never by the kernel.
2. Some environment variables are defined by POSIX (see Table 7.7 on Page 193).
3. ISO C defines no environment variables
4. System functions are getenv, putenv, setenv, unsetenv, clearenv.
5. Shell-level commands are getenv, setenv, unsetenv, printenv.
The setjmp and longjmp Functions (Section 7.10).
1. These provide a low-level form of exception handling in C.
2. They manipulate the runtime stack directly.
3. See Figure 7.10 on Page 197.

Resource Limits (Section 7.11).

Every process has them.
They are queried and set with the getrlimit and setrlimit system functions.
1. Theses are defined in <sys/resource.h>.
2. Signatures:
```
int getrlimit(int resource, struct rlimit *rlp);

int setrlimit(int resource, const struct rlimit *rlp);
```

The first parameter takes values from system-defined constants, notable ones of which are the following:

`RLIMIT_AS`	Max size in bytes of a process's total memory, as available
from the `sbrk` function.
`RLIMIT_CPU`	Max amount of CPU time in seconds.
`RLIMIT_FSIZE`	Max size in bytes of a file that may be created.
`RLIMIT_NOFILE`	Max number of open files per process.
`RLIMIT_NPROC`	Max number of child processes per user ID.
`RLIMIT_STACK`	Max size in bytes of the stack.

When the limits are exceeded, a system call fails or the process is sent an appropriate signal.

The struct in the second parameter is defined as:

struct rlimit {
    rlim_t rlim_cur;        /* current (soft) limit */
    rlim_t rlim_max;        /* maximum (hard) limit value for rlim_cur */
};

Rules governing limits are:
1. A process can change its soft limit up to hard limit, including lowering it.
2. A process can lower its hard limit, which is irreversible for the life of the process.
3. Only the superuser can increase the hard limit.
Limits can be set at the shell level with the command prctl; it sets limits for all subsequent child processes that are executed under the limited shell process.

Introduction to process control (Section 8.1).
1. One process can create a new own.
2. The fundamental way to do this is in UNIX is with a fork.
3. The fork creates parent and child processes.
4. The child can exec a different program, which is the most typical behavior.
5. The result of the fork, with or without an exec, is two independent processes,
  1. running asynchronously,
  2. doing two different things.
Process IDs (Section 8.2).
1. Every process has a unique ID.
2. ID 0 belongs to kernel, typically to the scheduler.
3. ID-related system functions are:
  
  pid_t getpid(); // get ID of current running process
  
  pid_t getppid(); // get ID of parent process
  
  uid_t getuid(); // get user ID of current process
4. There are no error returns from any of these functions, since a process always has a process ID and user ID.
fork (Section 8.3).
1. Signature:
```
pid_t fork();
```
2. Returns:
  - 0 in child
  - child ID in parent
  - -1 on error
3. Creates a new process.
4. Called once, returns twice, once in each of two processes (the parent and child).
5. Return value indicates where you are.
6. Here is a template of how fork is most typically used:
```
int main() {




  pid_t pid;




  if ((pid = fork()) < 0) {
    ...                     /* fail */
  }
  else if (pid == 0) {
    ...                     /* child */
  }
  else {
    ...                     /* parent */
  }
}
```
7. Two uses of fork:
  1. Do two pieces of work.
  2. Run two different programs.
8. Typical parent/child behaviors.
  1. Parent waits for child to complete.
  2. Parent and child go their separate ways.
9. Parent/child sharing.
  1. Child inherits from parent the following data:
    - all open files
    - user ID
    - current working dir
    - environment
    - several other things (see Stevens Page 215)
  2. Child differs from parent in the following ways:
    - return value from fork
    - process ID
    - parent process ID
    - other things (see Stevens Page 215)
wait and waitpid (Section 8.6).
1. Signatures:
```
pid_t wait(int* status)




pid_t waitpid(pid_t pid, int* status, int options)
```
2. Returns:
  - pid if OK
  - -1 on error
  - 0 no stopped or exited children
3. Macros to check exit status:
  - WIFEXITED(status)
  - WIFSIGNALED(status)
  - WIFSTOPPED(status)
  - WIFCONTINUED(status)
4. Detailed examples on pages 222 - 223.

The exec Family of Functions (Section 8.10).

Used to run a different program.
Very typical for forked child to do an exec.

(Solaris) signatures:

int execl(const char *path, const char *arg, ... /*, (char *)0 */);




int execlp(const char *file, const char *arg, ... /*, (char *)0 */);




int execle(const char *path, const char *arg, ... /*, (char *)0, char *const envp[] */);




int execv(const char *path, char *const argv[]);




int execvp(const char *file, char *const argv[]);




int execvP(const char *file, const char *search_path, char *const argv[]);

Letter suffixes:
- "p" means use PATH envir var
- "l" means use varargs list
- "v" means use argv vector
- "e" means use envp array
Table 8.14 on Page 233 is a useful summary of the exec family.

Here is the previous example of the fork template, this time with an exec in the child process:

   int main() {
   

   

     pid_t pid;
   

   

     if ((pid = fork()) < 0) {
       ...                     /* fail */
     }
     else if (pid == 0) {
       execvp(...)             /* child */
     }
     else {
       ...                     /* parent */
     }
   }

The system function (Section 8.13).
1. You have hopefully already discovered this function for your work in Programming Assignment 4.
2. It's the way you execute a command string from inside a program.
3. system is implemented by calling fork, exit, and waitpid.
4. Any and all shell commands can be executed using system.
5. The return values of system are explained on Page 246; these are relevant to Programming Assignment 4, in a couple test cases where the system call from smake fails.
exit functions (Section 8.5).
1. As outlined in Notes Week 6 (Stevens Section 7.3), there are five ways a process can exit:
  1. Return from main.
  2. Calling exit.
  3. Calling _exit or _Exit, which do not run exit handlers.
  4. Return from last thread (Sect 11.5).
  5. Calling pthread_exit (Sect 11.5).
2. The three forms of abnormal termination are:
  1. Calling abort (Section 10.17), which generates a SIGABRT signal.
  2. Receipt of a signal (Chapter 10), which can come from the process itself, another process, or the kernel.
  3. Cancellation of the last process thread (Sections 11.5 and 12.7).
3. When a process exits, by whatever means, the kernel
  - closes all open descriptors for the process,
  - releases all process memory,
  - and performs all other necessary process-termination processing.
4. There are three interesting cases of process termination with respect to the order in which parent and child processes exit:
  1. the parent terminates before child, with the child thereby becoming an orphan
  2. the child terminates before its parent
  3. an orphaned child terminates
5. When a parent terminates before its children, the kernel's init process becomes the childrens' parent.
6. When a child terminates before its parent, the child becomes a zombie.
  1. A zombie exists so a parent can check the exit status of a child after then child has terminated.
  2. When the parent checks a zombie with wait or waitpid, the zombie goes away.
7. When an orphaned child terminates, init calls a wait function on it to get rid of its zombie, thereby preventing the system from being clogged by zombies.
Other wait functions (Sections 8.7 and 8.8).
1. Most UNIX systems provide additional variants of wait, which provide a bit more flexibility to the user.
2. Pages 226 and 227 describe waitid, wait3, and wait4.
Race conditions (Section 8.9).
1. A race condition occurs when multiple processes operate on a shared resource, and the outcome of the processes depends on the order in which the resource is accessed or modified.
2. Such conditions can arise any time a child is forked, and the logic in the parent and/or child process depends on the order in which the processes execute.
3. A good example is with Task 1 of Lab 6, where the shared resource is stdout.
  1. The intention of the first task is that the parent and child processes both proceed immediately to print the odd and even numbers, at effectively the same time.
  2. The parent does not wait for the child to terminate until after the parent has done its even-number printing.
  3. Given this, there is a race condition on stdout, such that the output of odd and even numbers will be interleaved in unpredictable orders.
  4. To see this, you only need to provide a reasonably large value of N as a command-line argument.
4. To avoid race conditions, processes need to use some form of inter-process communication (IPC).
  1. This can be done with signals, which we'll discuss next week.
  2. There are a wide variety of other forms of IPC than can be used for process coordination to avoid race conditions, some of which you'll study in courses like CSC 453.
Changing user and group IDs (Section 8.11).
1. All processes have three separate process IDs:
  1. The real ID, which is settable only by a process with superuser privileges, such as the login process.
  2. An effective process IDs, which is set by an exec function only if the set-user-ID bit is set for the program being exec'd.
  3. A saved set-user ID, which is copied from the effective user ID by exec.
2. The function setuid(uid_t uid) is used to change one or more IDs of a process.
  1. Called by a superuser process, setuid sets all three IDs to the uid argument value.
  2. Called by a non-superuser, setuid can be used to set the effective ID back to a real ID or saved set-user-ID, after the effective ID has been changed by an exec call to a set-user-ID program.
  3. In any other case, calling setuid results in an error.
3. The bottom line is that setuid is only useful to non-superusers to change the effective ID after it's been changed by execing a set-user-id program.
  1. Recall that a set-user-id program is one that has had its owner's execute bit set to 's', meaning that the program will run with the owner's user ID when executed by another user.
  2. At the shell level, an executable program prog is made set-user-id with the command
    
    chmod u+s prog
4. The same three levels of ID apply to the group ID, and the setgid function.
5. An important note on calling the system function from a set-user-id program.
  1. This is a security hole, and should never be done.
  2. Pages 249-250 of Stevens discuss this issue in further detail.
Interpreter files (Section 8.12).
1. A file that begins with the following line
  #! pathname
  is considered by the kernel to be executable using the interpreter specified in pathname.
2. The most typical use of this is with shell programs as the interpreter, as in
```
   #!/bin/tcsh
   
```
  at the top of shell scripts intended to by executed using /bin/tcsh.
3. Interpreter files can be invoked with the exec functions, with the kernel handling the necessary details.
Process accounting (Section 8.14).
1. Most UNIX systems provide process accounting functionality.
2. Accounting information includes when a process terminates, how much time it used, and other execution data.
3. This information can be used by sys admins for whatever purpose they may choose.
User identification (Section 8.15).
1. The getlogin function returns the string login name of the user who owns the current process.
Process times (Section 8.16).
1. Each process has three measurable times:
  1. clock time -- the amount of elapsed real time the process takes to execute
  2. user CPU time -- the amount of CPU time spent by instructions in the user's program
  3. system CPU time -- the amount of CPU time spent by kernel instructions, executed on behalf of the user process.
2. The times function, and associated struct tms provide this timing information for a process.