CPE 308

Java and Data Structures Review



Overview

Do you know the reason for the arrangement of letters on a standard keyboard? QWERTY keyboard layout



The Dvorak keyboard is argued to be a vastly more comfortable and efficient alternative to the standard "QWERTY" pattern, because the most commonly occurring  letters in English appear on the home row.


Dvorak keyboard layout


Problem Statement

We want a program using Sets to read a file of English text and count the number of characters that appear on each of the three rows of both style keyboards.  Consider only alphabetic characters, not numbers or punctuation.  The name of the file will be provided on the command line.  Follow the examples in the command line tutorial.

Instructions

  1. Create a test case: compute by hand the results for QWERTY and Dvorak using this sample sentence: "pack my box with five dozen liquor jugs".
  2. Here is a Java program to meet the requirements above.  Create a NetBeans project and compile and execute this code.
  3. The output should appear something like this:

    QWERTY

    Dvorak


    count
    %
    count %
    Top row
    200
    40
    100
    20
    Home row
    200
    40
    300
    60
    Bottom row
    100
    20
    100
    20
    Total
    500

    500



  4. Test the program using the sample sentence and verify that the results match your manually calculated answers. Repair any errors as necessary. Note: The data file should be placed in the project home directory, not in the src directory.
  5. Once the program passes your tests, you can try it out on the complete text of "The Adventures of Huckleberry Finn" by Mark Twain. 

The solution provided in step 2 is poorly coded. Using NetBeans, follow the instructions below to “refactor” the program: improve the code without changing the functionality.


  1. The manner in which the sets are initialized is very cumbersome. It shouldn't take 52 “add()” statements to initialize the sets. A set can be initialized in its constructor like this:
    Character[] qwertyTopKeys = { 'q', 'w', 'e', 'r', 't', 'y', 'u', 'i', 'o', 'p'};
    HashSet<Character> qwertyTop =
    new HashSet<Character>(java.util.Arrays.asList(qwertyTopKeys));


  2. Notice the characters are stored in six different named sets. The disadvantage of this is that in order to determine which set a character belongs to, you have to have six different IF statements (lines 119 - 139). One improvement is to create an array of six elements, where each element is a set. Then you can write a loop that iterates six times and have just a single IF statement.
    HashSet<Character>[] keyboardRows = (HashSet<Character>[]) new Object[6];
    Make this modification and reduce the six IF statements to just one.

  3. Similarly it seems silly to have six different variables to store the counts when it could be done with an array. Fix this.

  4. If you reflect on it, you might realize that an array is perhaps not the ideal data structure, because the elements can only be referenced by an integer. Using '0', '1', and '2' to represent top, home, and bottom rows, for example, is purely arbitrary. A better solution would be to use a Map instead of an array. Then we could refer to entries in the map using meaningful words such as "qwertytop", "dvorakhome", etc.
    But then how would we iterate over the map? Use a for-each loop over the map's “keyset”.

    Make this modification to your solution. Similarly, keep the counts in a map as well.

    Make sure your program still passes your tests. Clean up the formatting and add explanatory comments. Insert your name in the @author javadoc tag.  Print your finished solution and a sample execution and submit it in class on the assigned due date.




instructor