CSC 101 / CPE 101:
Fundamentals of Computer Science 1

Summer 2001
(Sections 01 & 02 only)

Lab # 8
Demonstrate either or both of your team's
two programs in Lab, on Monday of Week #9.

Goals for this week's lab:

Practice program design.
Practice using arrays.
Practice program testing and revision.
Have some fun.

Introduction to Ciphers

Are you intrigued by mysteries? Can you keep a secret? If you have access to private information, do you know how to pass that along to people who need to know it without risking its being discovered by people you don't want to have it?

Cryptography is the study of secret writing. It has been around for as long as people have had secrets: it thus has a long history but it also has modern relevance. It can be used for anything from military or industrial communication to a discussion among you and your siblings about what to give your mother for her birthday. No matter what level of sophistication is involved, the two basic techniques of cryptography are:

encryption: taking a message and converting it into a coded form that is less easily read (also called encoding), and
decryption: taking a coded message and converting it back into its original, readable form (also called decoding).

And the basic principle is this: if you encrypted a message, you should be able to decrypt it, and so should anyone else to whom you supply a "key" for doing so. Beyond that, if others encounter your encrypted message and know a few things about cryptographic methods, you want it to be relatively difficult for them to "break" your code, and the level of difficulty in doing so should increase with the level of the need for keeping the information secret.

While a full discussion of cryptographic techniques is well beyond the scope of this course, some forms of encryption are accessible to us at this point. One of the simplest sort of encryption is a substitution cipher where other symbols are substituted for the letters of the alphabet used in the message. To read the message, it is necessary to know which symbol is substituted for which letter. A handy source of symbols to substitute is the very same alphabet, but rearranged.

One of the earliest examples of encryption is the substitution cipher used by Julius Caesar during the Gallic Wars. (For this reason, it is known as the Caesar Cipher.) According to this scheme, letters of the alphabet are offset by a certain number of positions, with the end of the alphabet wrapping around to the beginning. That is:

Standard Plain Alphabet:		`abcdefghijklmnopqrstuvwxyz`
Encoded with a shift of 3:		`defghijklmnopqrstuvwxyzabc`

The Caesar cipher is a substitution cipher, which means that you replace each letter in the original text by the one in the same position in the ciphered alphabet. Thus, the original message, "Aren't we done yet?" would be encrypted as, "Duhq'w zh grqh bhw?"

Here are some other possibilities for shifts we might make to the standard plain alphabet, along with our sample message in plain text, and three more possible encodings of that original:

Plain Alphabet:	`abcdefghijklmnopqrstuvwxyz`	`Aren't we done yet?`
If plain is offset by 23:	`xyzabcdefghijklmnopqrstuvw`	`Xobk'q tb alkb vbq?`
If plain is offset by 15:	`pqrstuvwxyzabcdefghijklmno`	`Pgtc'i lt sdct nti?`
If plain is offset by 11:	`lmnopqrstuvwxyzabcdefghijk`	`Lcpy'e hp ozyp jpe?`

Notice that, if you encrypt a message using a Caesar Cipher with an offset of 3, you can decrypt it by using a Caesar Cipher with an offset of -3. You can also decrypt it by an offset of +23, which is computed from the number of letters in the system minus the original offset (here, 26-3). That is, shifting "Duhq'w zh grqh bhw?" with an offset from the original alphabet of 23 would convert it back to "Aren't we done yet?" Similarly, a message encrypted with an offset of 15 can be decrypted by using an offset of 26-15 = 11.

Plain Alphabet:	`abcdefghijklmnopqrstuvwxyz`	`Duhq'w zh grqh bhw?`
Offset by 23:	`xyzabcdefghijklmnopqrstuvw`	`Aren't we done yet?`
or
Plain Alphabet:	`abcdefghijklmnopqrstuvwxyz`	`Pgtc'i lt sdct nti?`
Offset by 11:	`lmnopqrstuvwxyzabcdefghijk`	`Aren't we done yet?`

Try a few other simple ones yourself, by hand, to make sure you understand how this works (e.g., if you're a Clarke/Kubrick fan: confirm that, with an offset of 1, "HAL" becomes "IBM"). If working with a full alphabet is too cumbersome, just take any 4 or 5 consecutive letters, and try understanding how the offsets, coding, and decoding works for those before you expand the process to handle more letters.

A Simple Caesar Cipher

For this lab, we are going to develop a SimpleCaesar class to do this sort of encryption for us. Here is a summary of what you are expected to do:

Prompt for and read an encryption key.
Generate a character array, cipher[ ], to hold the encodings you will use for that key.
Prompt for and read an input String to be encrypted.
Encrypt the String you received.
Display a new String, with the encrypted message.
Determine whether there is another String to be encrypted.

Notes:

The major design steps have already been taken care of for you, and are provided in some starter-code (StartCaesar.java). Get a copy of this file, and rename it to be SimpleCaesar.java for your lab assignment.
The demo is written using a constant for the alphabet-size so that it can be easily converted to another size (e.g., for a non-English alphabet, to include certain punctuation marks, etc.). Keep that feature.
Unlike last week's work, this lab does not contain classes (i.e., objects) that will be instantiated. That means that the methods you write for this task (and the variables you use) will best be declared static so that they don't have to be called with reference to a particular object. (If that wasn't obvious to you from the start, make sure you understand why that comment is being made. Look at the various headers and declarations in the code provided and understand why they have (or have not) been declared static. We're doing it this way because some of you are still struggling with Objects, so this is a chance to focus on using a simple array on its own, before you have to add Objects back in which, by the way, the homework assignment will require).
The code provided does not contain full documentation (in particular, the header block of detail is missing). You should fully document the code that you write, as you work with and add to it.

A supporting document contains more detail on what is expected and how you might approach each step of the overall process. I know it's a lot of reading but, if you already understand how to do any of the steps above then you need only skim over the information in the numbered items; on the other hand, if you are stuck on a step above, then this document should give you some guidance on how to proceed.

When you have your lab completed, find another team that is willing to test it out for you for several String and offset combinations of their own choosing. Once you are sure it runs without error, signal to your instructor to check you off on this.

Because we have not yet covered sorting (and won't cover it in this course), we won't be able to develop a real decryption method. That is, we'll only decrypt messages for which we know the encryption method that was used. The "frequency table" part of the lab (next), however, will be useful when you have learned sorting and want to continue with this project (for a future course assignment, or just for fun on your own).

Ciphers and Letter Frequencies

As mentioned at the beginning of this lab, the discipline of cryptography as been around for as long as people have had a secrets.

If you encrypt a String, then you should be able to decrypt it yourself, or to give both the encrypted String and the decoding "key" key to someone else who should then be able to decrypt it as well. But what if you get a message without the key? or forget the key for a message you encoded yourself?

Cryptanalysis, the study (and "cracking") of encoded messages has been around ever since someone tried to stick his nose into someone else's business. That probably happened within 10 minutes of the invention of cryptography.

What the field involves is: looking for clues that will help the cryptanalyst discover the original plaintext without having the key. For the Caesar cipher you implemented above, this process is trivial: there are only 26 possible keys, so it's possible (and practical) to just try them all.

For a general substitution cipher, however, the problem is more complicated. There are 26! (or 403291461126605635584000000) possible permutations of the English alphabet. At one key per second, it would still take over one billion billion years to try them all.

Notwithstanding that, permutation ciphers are very easy to break. Why? Because there are lots of clues in the language which make it so we don't have to try all possible permutations. A simple and powerful tool for breaking such ciphers is Frequency Analysis. If you count the number of times each letter occurs in the message, you will find they do not occur with an even distribution. For example, in English, the letter 'e' occurs twice as often as any other letter.

In English, the order of frequency of occurrence is roughly:

e t a o n i r s h d l u c m p f y w g b v j k q x z

To break a substitution cipher, usually it is sufficient to count up the frequencies of letters in the ciphertext and guess keys that match up the most frequently found letters in the ciphertext with the most frequently found letters in the language. (If that's not quite enough, there are lots of additional tricks you can try as well. For example, certain letter combinations are more common at certain positions within a word as well.)

This is one of the many reasons to keep your covert communications short: the longer the sample of ciphertext, the better the statistical correlation between the expected and actual letter distributions. Short messages don't give eavesdroppers much to work with, but the odds shift to the side of the code-breakers for longer ones.

To do the actual matching, you need to know various tools (including sorting algorithms and implementing or extending other Java features) that are not part of this course. For this lab, however, you will write a small program that creates a frequency table. You may later try to expand this to attempt key-recovery, but for now you'll just do the counting.

Frequency Tables

You need to write a program Freq.java which reads a text message, counts the number of times each letter appears in it, and prints out the counts at the end. For your input and output, you may once again want to start with StartCaesar.java. Your solution will also have similarities to example 6.4, LetterCodes.java, on pages 274-5 in the book, and available from the textbook authors' website. (Note, however, that it will not be identical to that.)

However you start, you need to make the program do the following:

Read its input and count characters until there is no more input to be read.
Use an integer array (e.g., indexed 0..25) to hold the counts for each letter.
Count frequencies of letters in the input:
- Non-letters (anything that is not 'a'..'z' or 'A'..'Z') are ignored.
- Uppercase and lowercase letters are counted together.

At the end of the input, your program should print out each letter, with its corresponding count. For example:

% java Freq
This is a test of the Letter-Counting program.

Letter frequencies:
        a: 2
        b: 0
        c: 1
        d: 0
        e: 4
        f: 1
        g: 2
        h: 2
        i: 3
        j: 0
        k: 0
        l: 1
        m: 1
        n: 2
        o: 3
        p: 1
        q: 0
        r: 3
        s: 3
        t: 7
        u: 1
        v: 0
        w: 0
        x: 0
        y: 0
        z: 0

As usual, first make sure that you understand what you are trying to do, design it, then write it and test it (and redesign and rewrite as needed...).

Once you think you have it working, test your new program. Feed it some different pieces of English text and look at the output to see if it does it right. Now try looking at the distributions of ciphertext generated by your encryption program from Part 1. Do the encrypted distributions look like English? How are they similar? different?

Once you've done that, compare notes with at least one other team. Reflect on the design, implementation, the results of this program, and new things you learned during this lab. Keep detailed notes in your course notebook; each team member should contribute one significant thought to a team summary page that you will turn in after your work has been checked by the instructor.

When you're done, demonstrate your working program(s) for your instructor. On the one page reflection sheet to turn in, remember to include the names of all your team members. Also, include the names of students from any other teams with whom you consulted, whether they provided you with information or you provided them with some, or both. Also, make sure that each team member has saved his or her own copy of all the work you did (the designs and the programs).

Requirements & Reminders

Each team will hand in one reflection document. Most teams will be asked to demonstrate their programs and/or display their .java code in lab: when asked, you should be prepared to demonstrate either SimpleCaesar or Freq, or both.
Whenever you log on to a computer system, clean up and log off / logout when you are done.
Talk with your classmates if you have questions: often they can help you. If, together, you are still stuck, then come and talk with me.
Remember, I have regularly-scheduled office hours on Tuesday at noon (well, about 10 minutes after noon I arrive....

Site Navigation Links:

Help:	More Information on this Task
Back:	This Instructor's CSC-101 HomePage
Up:	This Instructor's HomePage

Copyright © 2000-01 by Carol Scheftic & Phil Nico. All rights reserved.
Requests to reuse information from this page should be directed to Carol Scheftic.
Page created 1 April 2001; last updated 21 May 2001

CSC 101 / CPE 101: Fundamentals of Computer Science 1

CSC 101 / CPE 101:
Fundamentals of Computer Science 1