|
Instructor's note:
This is a rough-draft technical specification for Homework Zero (h0) provided as an example for writing specifications for the projects in this course. It is not meant to be complete or exhaustive, though it's definitely overkill for a project as small as wc. Use it for inspiration along with the specification guidelines. |
This specification is technical. It does not contain extensive information about usage, usage scenarios, or output. This information is provided in the functional specification. However, we do briefly overview the information for convenience and context.
|
Instructor's note:
Making references back to the functional specification is fine, but the reference should be cited. An overview that ties the two documents together is also helpful. |
This project is embodied primarily in the Java utility wc. This utility accepts no arguments accept an optional filename. If no filename is provided, it reads data from stdin until that stream is terminated. In either case, it take the input and counts the number of strings that are separated by whitespace. The wc program can receive input through a file or the standard-input stream. It reports its results to standard-out unless there is an error. Errors are reported to standard-err.
This technical specification does not make any deviations from the requirements of the functional specification. To be absolutely explicit, we enumerate a number of points from the functional specification that we "import" into this document. They are listed here along with the section they belong to in the functional specification.
|
Instructor's note:
Testing, in general, serves at least three purposes
|
Analysis. The wordcount software is not very complex in that it is not interactive (reading from standard-in does not count) and it is also not multi-threaded. In fact, it explicitly excludes concurrency guarantees. We can have high confidence in the correctness of the software by an automated verification of known inputs producing expected outputs.
As the number of possible inputs is, for all practical discussion, unbounded, the input-space must be divided into equivalence classes. The input has at least three axis of parameterization.
While unlikely, it is possible that some errors could occur through one input mode and not another. Therefore, all tests described below must be run in both input modes.
For completeness, the program will also be tested on the following documents
public static int count(String text)
|
Instructor's note:
In any real program, you will have tests that take significant amounts of time to run. These tests are not useful for regression testing as regression tests should complete quickly. For example, you might run regression tests before a check in. If your tests take hours (or days) to complete, that can make regression testing before check-in a real pain. Longer tests make better functional tests, or shake-down tests. |
Code coverage will be analyzed using the open-source tool EMMA available from source forge.
Any code that, by release, is not covered by a unit test will require re-examination. New tests will be constructed so that by the release deadline, all code is covered by unit tests.
Note that this test is not currently planned to be automated.
|
Instructor's note:
For every major algorithm, you should have a subsection detailing it. There are no subsections here because there is only one algorithm. |
The algorithm we have chosen for word counting is very basic. It involves iterating through each character. When the first non-whitespace character is encountered, a inWord flag is set to true. When the first whitespace character is encountered when the inWord flag is true, the flag is disabled and the word count increased. The process is repeated until the input is exhausted. When the input is exhausted, if inWord is true, then the word count is increased one final time.
Psuedo Code
|
Examples
In the following examples
Example #1
X |
| Line 3 | Algorithm initialized | inWord = false, count = 0 | |
| Line 4 | For loop starts | c = 'X', inWord = false, count = 0 | |
| Lines 5,6 | If statement true, inWord-> true | c = 'X', inWord = true, count=0 | |
| Line 4 | Input is exhausted, for loop ends | inWord = true, count=0 | |
| Lines 10,11 | If statement true, count->count+1 | inWord = true, count=1 | |
| Line 12 | returns 1 | inWord = true, count=1 |
MONKEY |
| Line 3 | Algorithm initialized | inWord = false, count = 0 | |
| Line 4 | For loop starts | c = 'M', inWord = false, count = 0 | |
| Lines 5,6 | If statement true, inWord-> true | c = 'M', inWord = true, count=0 | |
| Line 4 | Loop iterates over all characters | c = {'O'..'Y'}, inWord = true, count=0 | |
| Line 4 | Input is exhausted, for loop ends | inWord = true, count=0 | |
| Lines 10,11 | If statement true, count->count+1 | inWord = true, count=1 | |
| Line 12 | returns 1 | inWord = true, count=1 |
##MONKEY |
| Line 3 | Algorithm initialized | inWord = false, count = 0 | |
| Line 4 | For loop starts, iterates over ## | inWord = false, count = 0 | |
| Lines5,7 | c triggers no guards | c={##[0]..##[n]}, inWord = false, count = 0 | |
| Line 4 | For loop iterates to first non-whitespace character | c='M', inWord = false, count = 0 | |
| Lines 5,6 | If statement true, inWord-> true | c='M', inWord = true, count=0 | |
| Line 4 | Loop iterates over remaining characters | c={'O'..'Y'}, inWord = true, count=0 | |
| Line 4 | Input is exhausted, for loop ends | inWord = true, count=0 | |
| Lines 10,11 | If statement true, count->count+1 | inWord = true, count=1 | |
| Line 12 | returns 1 | inWord = true, count=1 |
##MONKEY## |
| Line 3 | Algorithm initialized | inWord = false, count = 0 | |
| Line 4 | For loop starts | inWord = false, count = 0 | |
| Lines5,7 | c triggers no guards | c={##[0]..##[n]}, inWord = false, count = 0 | |
| Line 4 | For loop iterates to first non-whitespace character | c = 'M', inWord = false, count = 0 | |
| Lines 5,6 | If statement true, inWord-> true | c='M', inWord = true, count=0 | |
| Line 4 | Loop iterates over remaining characters | c={'O'..'Y'}, inWord = true, count=0 | |
| Line 4 | Loop iterates to ## | c=##[0], inWord = true, count=0 | |
| Line 7 | If statement true, inWord->false, count->count+1 | c=##[0], inWord = false, count=1 | |
| Line 4 | Loop iterates over remaining characters of ## | c={##[1]..##[n-1]}, inWord = false, count=1 | |
| Line 4 | Input is exhausted, for loop ends | inWord = true, count=0 | |
| Line 12 | returns 1 | inWord = true, count=1 |
MONKEY#SEE |
| Line 3 | Algorithm initialized | inWord = false, count = 0 |
| Line 4 | For loop starts | c = 'M', inWord = false, count = 0 |
| Lines 5,6 | If statement true, inWord->true | c = 'M', inWord = true, count = 0 |
| Line 4 | Loop iterates over remaining characters | c = {'O'..'Y'}, inWord = true, count=0 |
| Line 4 | Loop iterates to # | c = #, inWord = true, count=0 |
| Lines 7, 8 | If statement true, inWord->false, count->count+1 | c = #, inWord = false, count=1 |
| Line 4 | Loop iterates | c = 'S', inWord = false, count = 1 |
| Lines 5,6 | If statement true, inWord->true | c = 'S', inWord = true, count = 1 |
| Line 4 | Loop iterates over remaining characters | c = {'E', 'E'}, inWord = true, count=1 |
| Line 4 | Input is exhausted, for loop ends | inWord = true, count=1 |
| Lines 10,11 | If statement true, count->count+1 | inWord = true, count=2 |
| Line 12 | returns 1 | inWord = true, count=2 |
Time and Space
The functional specification makes no requirements on time and space. Our algorithm examines every input character once and checks if that character is in the whitespace set. But the whitespace set size is fixed (and a small fixed value at that). Therefore, the running time is O(n) in the size of the input text.
The memory requirements are a small constant size memory space for the whitespace set and a memory space for the input text. The most naive algorithm will load the entire text into memory requireing O(n) space in the size of the input. However, the algorithm does not require all the text in memory at once, and could (moving to the other extreme), load a single byte from memory. Before final production, some tests must be run to find a good trade off for how much to load from disk into memory. It should be noted that right now, the maximum size is limited by the data type used for word counting, etc. (see the technical notes at the end of this document).
|
Instructor's note:
No document is ever finished in the say that code is never really finished. There are always "bugs" or parts that are not completed or are too ambiguous. TODO's seen here are just like TODO's in code. Most of the ones in this iteration of the document are about getting more information from the "customer" to fill in ambiguous parts of the functional spec. But most certainly there could be TODO's for a host of other maladies. |
Minor Algorithms
As part of the word counting algorithm, we need to check if a character is in the whitespace set. As detailed in the section on Data Structures, whitespace characters will be stored in an immutable Java Collection's Set. For checking if a character is a member of the whitespace set, the normal Set method contains() will be used.
While the functional spec did not require that wc be usable from other classes, it is a natural addition to the software. Functionality would be extracted from the main method anyway for better code readability and maintainability. Making the word-counting function public is not "dangerous" because the class is effectively an immutable singleton. While it is arguable that this additional API adds to the testing complexity of the final product, it also makes the wc more easily unit-tested.
There is no guarantee that Java's whitespace is the same as the whitespace definition in the functional specifications. Therefore, we will explicitly create an immutable set containing all of the specified whitespace.
These components are described in detail in the following JavaDocs.
| Phase | Milestone | Due Date | Primary Resource |
|---|---|---|---|
| Spec/Prototype | 9-JAN-2007 | ||
| Determine non-printing characters | 9-JAN-2007 | Seth Nielson | |
| Write functional tests | 9-JAN-2007 | Seth Nielson | |
| Verify wc -w compatibility; document inconsistencies | 9-JAN-2007 | Seth Nielson | |
| Design datastructures API and write unit test plan9-JAN-2007 | Seth Nielson | ||
| Complete specification | 9-JAN-2007 | Seth Nielson | |
| Beta | 9-JAN-2007 | ||
| Write unit tests | 9-JAN-2007 | Seth Nielson | |
| Write code to count words in a file | 9-JAN-2007 | Seth Nielson | |
| Write code to count words from standard-in | 9-JAN-2007 | Seth Nielson | |
| Write error-handling code | 9-JAN-2007 | Seth Nielson | |
| Update technical spec | 9-JAN-2007 | Seth Nielson | |
| Release | 9-JAN-2007 | ||
| Run functional tests | 9-JAN-2007 | Seth Nielson | |
| Code coverage tests | 9-JAN-2007 | Seth Nielson | |
| Update comments and internal documentation | 9-JAN-2007 | Seth Nielson | |
| Write code for malformed executions and usage message | 9-JAN-2007 | Seth Nielson | |
| Write README and eratta | 9-JAN-2007 | Seth Nielson | |
| Update technical spec | 9-JAN-2007 | Seth Nielson |