Homework Zero

Technical Specification

Seth James Nielson

Comp 314 Spring 2007

Instructor's note:

This is a rough-draft technical specification for Homework Zero (h0) provided as an example for writing specifications for the projects in this course. It is not meant to be complete or exhaustive, though it's definitely overkill for a project as small as wc. Use it for inspiration along with the specification guidelines.


Homework 0 is the project name for a java program called "wc". This program is a command-line utility that behaves as a "drop-in" replacement to the *nix command "wc -w".

This specification is technical. It does not contain extensive information about usage, usage scenarios, or output. This information is provided in the functional specification. However, we do briefly overview the information for convenience and context.
Instructor's note:

Making references back to the functional specification is fine, but the reference should be cited. An overview that ties the two documents together is also helpful.

This project is embodied primarily in the Java utility wc. This utility accepts no arguments accept an optional filename. If no filename is provided, it reads data from stdin until that stream is terminated. In either case, it take the input and counts the number of strings that are separated by whitespace. The wc program can receive input through a file or the standard-input stream. It reports its results to standard-out unless there is an error. Errors are reported to standard-err.

This technical specification does not make any deviations from the requirements of the functional specification. To be absolutely explicit, we enumerate a number of points from the functional specification that we "import" into this document. They are listed here along with the section they belong to in the functional specification.

  1. whitespace definition (System Requirements and Usage)
  2. usage and input parameters (System Requirements and Usage, Usage Scenarios)
  3. output (Output)
Furthermore, the design embodied in this document should satisfy all proposed functional tests required by the functional specification. The remainder of this document details the following technical facets of this project:
  1. Test plan
    1. Functional
    2. Unit/Regression
    3. Code Coverage
  2. Algorithms
  3. Datastructures
  4. Development and Resource Schedule
  5. Technical Notes

Test Plan

Instructor's note:

Testing, in general, serves at least three purposes

  1. It provides greater assurance of reliablity and correctness
  2. It reduces the likelihood that changes introduce serious errors
  3. It provides a metric for program completion
The first two purposes are accomplished by unit/regression testing and code coverage analysis, while the second purpose is accomplished through functional (system level) testing.

Functional Test Plan

Analysis. The wordcount software is not very complex in that it is not interactive (reading from standard-in does not count) and it is also not multi-threaded. In fact, it explicitly excludes concurrency guarantees. We can have high confidence in the correctness of the software by an automated verification of known inputs producing expected outputs.

As the number of possible inputs is, for all practical discussion, unbounded, the input-space must be divided into equivalence classes. The input has at least three axis of parameterization.

  1. input size - the number of words in the input
  2. content - the word delimiters in the input and the min and max word size
  3. input mode - the method of input, either from file or standard-in

While unlikely, it is possible that some errors could occur through one input mode and not another. Therefore, all tests described below must be run in both input modes.

  1. an input that that contains all possible 1 character delimiters
  2. an input file that contains 2 and MAX_INT of the same character delimiters
  3. an input file that contains every possible 2 combination of delimiters
  4. an input file that includes characters that are not printing and not whitespace in the following forms:
    1. at the beginning of a sequence of printing characters
    2. at the end of a sequence of printing characters
    3. in the middle of a sequence of printing characters
    4. at the begnning, and middle, and end of a sequence of printing characters
  5. an input file that repeats the previous test, but with every possible two character combination of non-printing, non-whitespace characters.
  6. an input file that has no words
  7. an input file that has 1 word of 1 character length
  8. an input file that has 2 words of 1 character length
  9. an input file that has MAX_LONG words of 1 character length
  10. an input file that has 1 word of MAX_INT character length
  11. an input file that has 2 words of MAX_INT character length
  12. an input file that has MAX_LONG words of MAX_INT character length
All test files will be constructed by a separate program. We expect the probability of error in this construction program to be very small and to have a neglibable impact on the assurance of correctness on wordcount.

For completeness, the program will also be tested on the following documents

  1. The Declaration of Independence
  2. The Constitution of the United States
  3. The Emancipation Proclamation

Unit/Regression Tests

Our only class is the wc (described below) with two static methods. We anticipate that the functional tests described perviously will adequately test the main() method. The only remaining method that needs to be tested at the unit level is public static int count(String text).

public static int count(String text)

  1. Null Pointer Test: where text is null. Should throw a NullPointerException
  2. Empty Text Test: where text is "". Should return 0.
  3. 1 Word Test: where text has no whitespace characters, but has a length greater than 0. Should return 1.
  4. 0 Word Test: where text has no non-whitespace characters, but has a length greater than 0. Should return 0.
  5. 2 Words/No Terminating Whitespace: where text is of the form CWC where C ranges from 1-100 (picked 100 arbitrarily) non-whitespace characters, and W ranges from 1-100 whitespace characters. This tests that a terminating whitespace character is not needed for recognizing a word. The output should be 2 for every size of C and W.
  6. All Whitespace Characters: where text is of the form Cw1Cw2C..wnC where C is an arbitrary length non-whitespace sequence and w1 is the first character of the whitespace set, w2 is the second character, and so forth. This tests that each whitespace character can terminate a word. The output should be n+1.
  7. Max Short Test: where text is of the form C1wC2..C_MAX_Short where w is a whitespace character and Ci is the ith arbitrary sequence of non whitespace characters. The output should be MAX_SHORT.
These tests should be fast enough to also serve as a regression test for that method of the class. We will run regression tests every day using an automated regression test system. It should be noted that these tests do NOT cover many error conditions. The functional spec has no requirements for dealing with overflow. This is a serious deficiency in the specification.
TODO: Work out with the customer how overflow should be handled and reported. When we get a better understanding of the error conditions of this utility, we will add a section to this document called Errors and Error Reporting
Instructor's note:

In any real program, you will have tests that take significant amounts of time to run. These tests are not useful for regression testing as regression tests should complete quickly. For example, you might run regression tests before a check in. If your tests take hours (or days) to complete, that can make regression testing before check-in a real pain. Longer tests make better functional tests, or shake-down tests.

Code Coverage Analysis

Code coverage will be analyzed using the open-source tool EMMA available from source forge.

Any code that, by release, is not covered by a unit test will require re-examination. New tests will be constructed so that by the release deadline, all code is covered by unit tests.

Note that this test is not currently planned to be automated.

Algorithms

This project is relatively simple in that it encapsulates only one major algorithm: the word counting algorithm.
Instructor's note:

For every major algorithm, you should have a subsection detailing it. There are no subsections here because there is only one algorithm.

The algorithm we have chosen for word counting is very basic. It involves iterating through each character. When the first non-whitespace character is encountered, a inWord flag is set to true. When the first whitespace character is encountered when the inWord flag is true, the flag is disabled and the word count increased. The process is repeated until the input is exhausted. When the input is exhausted, if inWord is true, then the word count is increased one final time.

Psuedo Code
  1. WORD-COUNT(text)
  2.     inWord := false
  3.     count := 0
  4.     for c in text
  5.         if !inWord and !isWhitespace(c)
  6.             inWord := true
  7.         if inWord and isWhitespace(c)
  8.             inWord := false
  9.             count := count + 1
  10.     if inWord
  11.         count := count + 1
  12.     return count

Examples

In the following examples

Example #1
    X
    
Line 3Algorithm initialized inWord = false, count = 0
Line 4For loop starts c = 'X', inWord = false, count = 0
Lines 5,6 If statement true, inWord-> true c = 'X', inWord = true, count=0
Line 4 Input is exhausted, for loop ends inWord = true, count=0
Lines 10,11 If statement true, count->count+1 inWord = true, count=1
Line 12 returns 1 inWord = true, count=1

Example #2

    MONKEY
    
Line 3 Algorithm initialized inWord = false, count = 0
Line 4 For loop starts c = 'M', inWord = false, count = 0
Lines 5,6 If statement true, inWord-> true c = 'M', inWord = true, count=0
Line 4 Loop iterates over all characters c = {'O'..'Y'}, inWord = true, count=0
Line 4 Input is exhausted, for loop ends inWord = true, count=0
Lines 10,11 If statement true, count->count+1 inWord = true, count=1
Line 12 returns 1 inWord = true, count=1

Example #3

    ##MONKEY
    
Line 3 Algorithm initialized inWord = false, count = 0
Line 4 For loop starts, iterates over ## inWord = false, count = 0
Lines5,7 c triggers no guards c={##[0]..##[n]}, inWord = false, count = 0
Line 4 For loop iterates to first non-whitespace character c='M', inWord = false, count = 0
Lines 5,6 If statement true, inWord-> true c='M', inWord = true, count=0
Line 4 Loop iterates over remaining characters c={'O'..'Y'}, inWord = true, count=0
Line 4 Input is exhausted, for loop ends inWord = true, count=0
Lines 10,11 If statement true, count->count+1 inWord = true, count=1
Line 12 returns 1 inWord = true, count=1

Example #4

    ##MONKEY##
    
Line 3 Algorithm initialized inWord = false, count = 0
Line 4 For loop starts inWord = false, count = 0
Lines5,7 c triggers no guards c={##[0]..##[n]}, inWord = false, count = 0
Line 4 For loop iterates to first non-whitespace character c = 'M', inWord = false, count = 0
Lines 5,6 If statement true, inWord-> true c='M', inWord = true, count=0
Line 4 Loop iterates over remaining characters c={'O'..'Y'}, inWord = true, count=0
Line 4 Loop iterates to ## c=##[0], inWord = true, count=0
Line 7 If statement true, inWord->false, count->count+1 c=##[0], inWord = false, count=1
Line 4 Loop iterates over remaining characters of ## c={##[1]..##[n-1]}, inWord = false, count=1
Line 4 Input is exhausted, for loop ends inWord = true, count=0
Line 12 returns 1 inWord = true, count=1

Example #5

    MONKEY#SEE
    
Line 3Algorithm initialized inWord = false, count = 0
Line 4For loop starts c = 'M', inWord = false, count = 0
Lines 5,6 If statement true, inWord->truec = 'M', inWord = true, count = 0
Line 4 Loop iterates over remaining characters c = {'O'..'Y'}, inWord = true, count=0
Line 4 Loop iterates to # c = #, inWord = true, count=0
Lines 7, 8 If statement true, inWord->false, count->count+1 c = #, inWord = false, count=1
Line 4 Loop iterates c = 'S', inWord = false, count = 1
Lines 5,6 If statement true, inWord->true c = 'S', inWord = true, count = 1
Line 4 Loop iterates over remaining characters c = {'E', 'E'}, inWord = true, count=1
Line 4 Input is exhausted, for loop ends inWord = true, count=1
Lines 10,11 If statement true, count->count+1 inWord = true, count=2
Line 12 returns 1 inWord = true, count=2

Time and Space

The functional specification makes no requirements on time and space. Our algorithm examines every input character once and checks if that character is in the whitespace set. But the whitespace set size is fixed (and a small fixed value at that). Therefore, the running time is O(n) in the size of the input text.

The memory requirements are a small constant size memory space for the whitespace set and a memory space for the input text. The most naive algorithm will load the entire text into memory requireing O(n) space in the size of the input. However, the algorithm does not require all the text in memory at once, and could (moving to the other extreme), load a single byte from memory. Before final production, some tests must be run to find a good trade off for how much to load from disk into memory. It should be noted that right now, the maximum size is limited by the data type used for word counting, etc. (see the technical notes at the end of this document).
Instructor's note:

No document is ever finished in the say that code is never really finished. There are always "bugs" or parts that are not completed or are too ambiguous. TODO's seen here are just like TODO's in code. Most of the ones in this iteration of the document are about getting more information from the "customer" to fill in ambiguous parts of the functional spec. But most certainly there could be TODO's for a host of other maladies.


TODO: Determine "sweet spot" fixed memory size for loading text from disk into memory.

Minor Algorithms

As part of the word counting algorithm, we need to check if a character is in the whitespace set. As detailed in the section on Data Structures, whitespace characters will be stored in an immutable Java Collection's Set. For checking if a character is a member of the whitespace set, the normal Set method contains() will be used.

Data Structures

All of the wc functionality will be encapsulated in a single class of the same name. This class requires no state and it will not be instantiatable. To meet the functional requirements, it provides a main method as an entry point to a stand along program.

While the functional spec did not require that wc be usable from other classes, it is a natural addition to the software. Functionality would be extracted from the main method anyway for better code readability and maintainability. Making the word-counting function public is not "dangerous" because the class is effectively an immutable singleton. While it is arguable that this additional API adds to the testing complexity of the final product, it also makes the wc more easily unit-tested.


TODO: Check with the customer to make sure that the addition of an API to the word counting algorithm is acceptable.

There is no guarantee that Java's whitespace is the same as the whitespace definition in the functional specifications. Therefore, we will explicitly create an immutable set containing all of the specified whitespace.

These components are described in detail in the following JavaDocs.

Development and Resource Schedule

Following the Comp 314 development model, this project will be completed in the three major phases: spec/prototype, beta, and release. Major milestones are as follows:
PhaseMilestoneDue DatePrimary Resource
Spec/Prototype9-JAN-2007
Determine non-printing characters 9-JAN-2007 Seth Nielson
Write functional tests 9-JAN-2007 Seth Nielson
Verify wc -w compatibility; document inconsistencies 9-JAN-2007 Seth Nielson
Design datastructures API and write unit test plan9-JAN-2007 Seth Nielson
Complete specification 9-JAN-2007 Seth Nielson
Beta9-JAN-2007
Write unit tests 9-JAN-2007 Seth Nielson
Write code to count words in a file 9-JAN-2007 Seth Nielson
Write code to count words from standard-in 9-JAN-2007 Seth Nielson
Write error-handling code 9-JAN-2007 Seth Nielson
Update technical spec 9-JAN-2007 Seth Nielson
Release9-JAN-2007
Run functional tests 9-JAN-2007 Seth Nielson
Code coverage tests 9-JAN-2007 Seth Nielson
Update comments and internal documentation 9-JAN-2007 Seth Nielson
Write code for malformed executions and usage message 9-JAN-2007 Seth Nielson
Write README and eratta 9-JAN-2007 Seth Nielson
Update technical spec 9-JAN-2007 Seth Nielson

Technical Notes

As with all specifications, the Homework 0 functional specification leaves some technical issues unspecified. Following the "do something reasonable and document it" policy, the following technical points are presented:
  1. Word count limited to counting 32767 words (count is stored as a Java short)
  2. Word length has no official limit, but will only be tested to a length of 32767 characters.
  3. Invalid executions will print an error and a usage message
  4. No special checking of overflow is required