CSC 357 Programming Assignment 1
sgrep -- A Simplified Version of
Grep
-- REVISED --
The deliverable for this assignment is a simplified version of the very useful grep utility. The program is called "sgrep", for "simple grep".
The following are excerpts from the grep man page that describe the functionality relevant to sgrep:
NAME grep - search a file for a pattern SYNOPSIS grep [-iln] regular-expression [filename...] DESCRIPTION The grep utility searches text files for a pattern and prints all lines that contain that pattern. If no files are specified, grep assumes standard input. Normally, each line found is copied to standard output. The file name is printed before each line found if there is more than one input file. Be careful using the characters $, *, ., [, ], and ^ in the patternlist because they are also meaningful to the shell. It is safest to enclose the entire patternlist in single quotes '...'. The grep utility uses limited regular expressions like those described on the regexp(5) manual page to match the patterns. OPTIONS The following options are supported -i Ignores upper/lower case distinction during comparisons. -l Prints only the names of files with matching lines, separated by NEWLINE characters. Does not repeat the names of files when the pattern is found more than once. -n Precedes each line by its line number in the file (first line is 1).
The regular expression argument is a form of generalized pattern that provides flexibility in searching through files. For example, the regular expression "T.*e" matches all strings that start with the letter "T" and end with the letter "e", such as "The", "These", "There".
The sgrep program uses a simplified subset of the regular expressions
recognized by grep. The following are excerpts from the
regexp man page that describe the structure of regular expressions
handled by sgrep:
DESCRIPTION A regular expression specifies a set of character strings. A member of this set of strings is said to be matched by the regular expression. Some characters have special meaning when used in a regular expression; other characters stand for themselves. The following characters have special meaning in a regular expression: . * ^ $ [ ] All other characters match themselves. A period (.) is a one-character RE that matches any character except newline. A one-character RE followed by an asterisk (*) is a RE that matches 0 or more occurrences of the one-character RE. If there is any choice, the longest leftmost string that permits a match is chosen. The caret (^) is special only when it appears at the beginning of a RE, and means that the RE only matches at the beginning of a line. The caret ($) is special only when it appears at the end of a RE, and means that the RE only matches at the end of a line. A non-empty string of characters enclosed in square brackets ([]) is a one-character RE that matches any one character in that string. The four special characters listed above stand for themselves within such a string of characters. The concatenation of REs is a RE that matches the concatenation of the strings matched by each component of the RE.
X.*Y
X[...].*Y
The following are specific limitations of the sgrep program:
The sgrep program does NOT need to handle abbreviated command-line
options, in which two or more option characters can be concatenated together.
E.g., the UNIX grep utility accepts the argument "-iln" as an
abbreviation of "-i -l -n". Again, sgrep does not need to
support such abbreviations.
Given a file named "input1" with the following contents
Von Neumann was the subject of many dotty professor stories. Von Neumann supposedly had the habit of simply writing answers to homework assignments on the board (the method of solution being, of course, obvious) when he was asked how to solve problems. One time one of his students tried to get more helpful information by asking if there was another way to solve the problem. Von Neumann looked blank for a moment, thought, and then answered, "Yes.".
The following table describes the output of various grep commands.
Command Output grep V input1 matches lines 1 and 6 grep v input1 matches lines 4 and 6 grep x input1 matches no lines grep -i v input1 matches lines 1, 4, and 6 grep 'd.*y' input1 matches lines 1, 2, and 5 grep '^a' input1 matches lines 3 and 6 grep ',$' input1 matches line 3 grep '[.,]' input1 matches lines 1, 3, 4, 6, and 7
The -n and -l arguments do not affect how the match is performed, but only the format of the output. Without either of these two arguments, sgrep outputs the entire contents of each matched line. With -n, the line number precedes the matched lines. With -l, only the name of matched files is printed, without the line contents (so -l makes the most sense when there are multiple input files).
You are encouraged to play around with grep, to see how it behaves on various inputs. The behavior of sgrep is a proper subset of UNIX grep. This means that sgrep produces exactly the same output as grep, for the subset of arguments and regular expressions supported by sgrep.
The complete set of required input files is in the online class directory, at
The corresponding correct outputs are inhttp://www.csc.calpoly.edu/~gfisher/classes/357/programs/1/testing/inputs
The program 1 testing plan has complete details of input/output behavior your program must exhibit. The plan is in the filehttp://www.csc.calpoly.edu/~gfisher/classes/357/programs/1/testing/expected-output
All of these testing files will be available by Thursday 5 April.http://www.csc.calpoly.edu/~gfisher/classes/357/programs/1/testing/plan.html
The following C library functions may be particularly useful in your
implementation. You can read about these in K&R, Stevens, and the man pages.
Function Description printf print to stdout fgets read a line of characters from a FILE* stream, including stdin strlen calculate the length of a string strstr locate the first occurrence of one string in another strtok find delimited tokens in a string strcpy copy strings strcmp compare strings fopen open a file fclose close a file feof test to see if a given stream has encountered an end of file
Your implementation can use any of the string processing functions described in the UNIX string(3C) library. However, your implementation canNOT use the the functions provided in the regular expression libraries regex(3C), regcmp(3C), or regexp(5).
Your implementation of sgrep will have to use string variables, and functions that take string parameters. Declaring the string-valued parameters is easy -- just use 'char *' as their type.
To be useful in a program, string variables must be declared as character arrays of a specific size. This applies in particular to the pattern and line string variables you will declare. Section 1.9 of K&R has some useful examples for dealing with string variables, i.e., character arrays.
The specification of sgrep is written to preclude the use of dynamic
memory allocation, i.e., malloc, in the sgrep implementation.
We will discuss this issue further in the coming weeks.
You must submit a single file named "sgrep.c" as the deliverable.
This file contains the implementation of the sgrep program that meets
the above specification.
The testing plan cited above has the precise point breakdown for this program. This plan has all required test cases that your program must pass. There are no extra "hidden" input files that will be used.
The handout on coding conventions specifies point deduction categories for violations of the conventions. Since Program 1 does not require a .h file as a deliverable, the conventions regarding .h files do not apply to Program 1. For this program, the top-level program comment and the comments for each function should appear in the .c file.
Throughout the quarter, your instructor will stress the utility of incremental development, for the purposes of scoring points on your programs. The idea of this is to build a working program in a step-by-step fashion, starting with the simple functionality, and incrementally adding harder functionality. If you get your program to work for some of the simple cases, but not all of the harder ones, you can still score a decent number of points on the assignment.
As concrete example of incremental development for this assignment, the
following are steps you can use to implement sgrep. Included for each
step are the number of points (out of 100) that successful completion of the
step earns.
Step 1: simple string matches, input only from stdin, no patterns 15You do not have to follow exactly these steps, in exactly this order. However, if you do, your development will likely go more smoothly, and you can earn some partial credit on the assignment if you cannot get everything to work.
Step 2: read from one file given on command line 5
Step 3: read from multiple files given on command line 10
Step 4: -n option 5
Step 5: -i option 5
Step 6: -l option 5
Step 7: patterns with '^' 5
Step 8: patterns with '$' 8
Step 9: patterns with '.' 8
Step 10: patterns with '[...]' 10
Step 11: patterns with '.' and '*' 12
Step 12: patterns with various combinations of operators 8
Step 13: error handling 4
NO collaboration is allowed on this assignment. Everyone must do their own
individual work.
Submit your deliverable using the handin program on falcon/hornet. The specific command is
Run this command from the directory where your copy of sgrep.c is stored.handin gfisher prog1 sgrep.c
You can resubmit your files as many times as you like, up to the submission
deadline. Each new submission completely replaces the previously submitted
file(s). If you follow a incremental development strategy, you can submit as
many partially-working versions as you like, as each step is completed.