The Flesch Readability Index
Introduction:
The Flesch readability index is a tool for estimating the
reading comprehension level necessary to understand a written
document. For a given document, the Flesch readability index is
an integer indicating how difficult the document is to
understand, with lower numbers indicating greater difficulty.
For example, the table below shows typical Flesch readability
index values for some common(and some not-so-common) reading
material:
Material |
Flesch Index |
Comics |
95 |
Consumer Ads |
82 |
Sports Illustrated |
65 |
Time |
57 |
New York Times |
39 |
Auto Insurance |
10 |
IRS Code |
-6 |
Flesch readability indexes are also often translated into the
educational level that is usually necessary to understand a
document:
Flesch Index |
Educational Level |
100+
|
Early
|
>91 |
5th grade |
>81 |
6th grade |
>71 |
7th grade |
>66 |
8th grade |
>61 |
9th grade |
>51 |
High School |
>31 |
Some College |
>0 |
College Graduate |
<= 0 |
Law School Graduate |
The Flesch readability index for a document is computed using 5
steps:
- Count the number of sentences in the document.
- Count the number of words in the document.
- Count the number of syllables in the document.
- Compute the index as:
Syllables Words
Flesch Index = 206.835 - 84.6 * --------- - 1.015 * -------
Words Sentences
Round the index to the nearest integer.
The index 999 is returned in these special cases: when the input is empty, when the input contains only blanks, or when the input contains no words.
Rules:
Sentence: periods, exclamation points, question marks, colons
and semicolons serve as sentence delimiters.
Words: each group of continuous non-blank characters with
beginning and ending punctuation and numbers removed counts as a word.
(Note: newline and tab characters should be treated as a blank.)
Syllables: each vowel in a word is considered one syllable
subject to:
- words of three letters or shorter count as single syllables;
- -es, -ed and -e (except -le) endings are ignored;
- consecutive vowels count as one syllable.
With a simple method for computing the Flesch readability index
it is possible for authors to write and revise documents until
they are comprehensible to the target audience. For example, a
newspaper reported might continue to revise an article until the
Fesch readability index is above 60.
At least one popular word processor is able to compute the
Fesch readability index. If you ask
Microsoft
Word to display the readability statistics for a document,
it will display a Flesch reading ease score, which is the same
as the readability index. If you are interested in more
information see chapter
2,
of Flesch's "How to Write Plain English" (interestingly
enough, Flesch was French!)
The Etexts web site
provides an extensive library of text files containing public
domain literary works. Some examples, with their readability index
include:
IMPLEMENTATION CONSTRAINTS
The main class must be named FleschApp.
The main method must follow this specification:
/**
* Launches Flesch Readability
program using the desired
* source of user input and
output. The default is to accept user input
* from System.in and display to
System.out.
*
* @param args Any non-empty first
argument will be interpreted as the name of
* a file in the same directory as
the program that contains the
* user input data.
* Any non-empty second argument
will be interpreted as the name of
* a file in the same directory as
the program to which the output
* is to be written.
* @throws FileNotFoundException if
the file isn't found.
* @throws IOException if an
illegal IO operation is attempted.
*/
public static void main(String[] args)
throws FileNotFoundException, IOException
There should be no application logic in the FleschApp class.
It simply processes the arguments and invokes other classes to do
the work.
IMPLEMENTATION TIPS
Your solution will be most flexible if you design
it so that it is passed a Reader and a Writer:
public
MyProgram(Reader rdr, Writer wtr);
Then it can accept input from anything that implements the Reader
interface,
and
write to any kind of Writer.
Don't use BufferedReader, use Scanner.
Scanner has this
constructor that allows one to scan from a Reader.
Here's a simple way to read all the text from a
reader into a String:
Scanner
scan = new Scanner(rdr).useDelimiter("\\A");
String text = scan.next();
Create a PrintWriter from a writer like this:
PrintWriter
display = new PrintWriter(wtr, true);
DESIGN CONSTRAINTS
You must use an Object Oriented design with classes that model the
significant entities in the problem domain: document, sentence, and
word.
You should use the project as a chance to refresh your Java
programming skills and practice writing modular, maintainable code.
TESTING
You must provide your own JUnit tests that cause 100% branch
coverage. (Obviously, the tests must pass.)
SUBMISSION
Submit a zip file containing your source code and JUnit tests to the
Web-CAT grader. Make sure you provide a correct @author javadoc tag
in every file you submit. Make sure your classes are in the default package.
The grader will run your unit tests, run
the instructor's tests, and check for conformance to the class
coding standard.
Grading
Your submission will be graded automatically by Web-CAT.
There must be no compiler warnings.
The "Correctness/Testing" portion of the score must be at least 76/80.
You can earn 20 additional points for the "Style/Coding" portion of
the score by and conforming to the class coding standard.
You are allowed 8 submissions to Web-CAT without
penalty. After that you lose 10% per submission.
Update 4/4:Prior to the last day of class your project must pass all the tests, regardless of whether it earns any points or not. If your program doesn't pass all the tests before this final deadline, you will be assigned a failing grade in the course.
Oracle of Java
You can run the instructor's solution on
the Oracle of Java.
Bug Bounty
If you find a defect in the instructor's solution, you earn 1% extra credit.
The first person to email the instructor with a reproducible defect earns the
bounty for that defect.
FAQ
Q: Are hyphens in words handled in any particular way? Or are we considering it in the same category as parentheses and the like.
A: Hyphens are considered punctuation.
Q: Are we assuming 'y' to be considered a consonant for all words? For example, what would be the syllable count for words such as "lynx" and "rhythm"?
A: 'Y' is a vowel.
Q: This is more pedantic, but using the rules for word suffixes, "tapes" and "dresses" would have the same syllable count. Is this something that can be ignored?
A: Yes, they have the same syllable count. Follow the rules.
Q: Can my classes be in a named package?
A: No, they must be in the default package.
Q: Is the input text "happy days" counted as one sentence?
A: Yes, even though it contains no sentence delimiters, it is one sentence (because it contains two words). "Happy days." is also one sentence.
Q: The Oracle of Java seems confused by this test case:
#$can't9* \"ain't,\" 234ABC 23abn45 @#$aba34dfs#$% @a@e@i@o@u@
What is the correct index?
A: It has 11 syllables, 6 words, and 1 sentence, so index is 46.
Update 2 Apr 2016: Version 1.2 of the oracle of java is available that now correctly handles this case.
Q: Does a sentence require at least 1 word in order for it to be counted as a sentence?
A: Yes.
Q: If we are reading from standard input, should the program terminate after calculating the index or should it continue to run until the user terminates the program?
A: Let me clarify that reading from standard input has NOTHING to do with your concern. The program should behave exactly the same regardless of where the input originates.
The real core of your question is: does the program accept multiple inputs?
The answer is no.
A single file can be provided on the command line, or it can read from System.in. The program reads from whatever input source is provided until that input is exhausted, and then it computes the index and terminates.
It doesn't make any sense for the program to "continue to run" after it has computed the index, because there is no more input for it to process. Yes, technically it could be in an executing state, but it wouldn't be doing anything. It would require an operating system directive to kill the process.
Q: I am really struggling with how to parse the string by multiple punctuation. I am currently using String.split() but am not having much luck like this. String[] splitDoc = doc.split(".!?:;");
But that does not parse the doc string by all those characters.
A: Correct. The argument to split() must be a "regular expression". It's a pattern used to find the delimeters.
Regular Expressions are a big topic, but you can learn enough to get started
by reading the first four sction of the Java Tutorial
about it here.
Otherwise you'll have to find an alternate method, such as looking for each delimeter independently.