A basic principle of data processing teaches the folly of trying to maintain independent files in synchronism. It is far better to combine them into one file with each record containing all the information both files held concerning a given key.
Yet our practice in programming documentation violates our own teaching. We typically attempt to maintain a machine-readable form of a program and an independent set of human-readable documents, consisting of prose and flow charts.
The results in fact confirm our teachings about the folly of separate files. Program documentation is notoriously poor, and its maintenance is worse. Changes made in the program do not promptly, accurately, and invariably appear in the paper.
The solution, I think, is to merge the files, to incorporate the documentation in the source program. This is at once a powerful incentive toward proper maintenance, and an insurance that the documentation will always be handy to the program user. Such programs are called self-documenting.
29.7.1 Keeping Records--Documentation. Even when the code is designed so that changes can be carried out efficiently, the design principles and design decisions are often not recorded in a form that is useful to future maintainers. Documentation is the aspect of software engineering most neglected by both academic researchers and practitioners. It is common to hear a programmer saying that the code is its own documentation; even highly respected language researchers take this position, arguing that if you use their latest language, the structure will be explicit and obvious.
When documentation is written, it is usually poorly organized, incomplete, and imprecise. Often the coverage is random; a programmer or manager decides that a particular idea is clever and writes a memo about it while other topics, equally important, are ignored. In other situations, where documentation is a contractual requirement, a technical writer, who does not understand the system, is hired to write the documentation. The resulting documentation is ignored by maintenance programmers because it is not accurate. Some projects keep two sets of books; there is the official documentation, written as required for the contract, and the real documentation written informally when specific issues arise.
Documentation that seems clear and adequate to its author is often about as clear as mud to the programmer who must maintain the code 6 months or 6 years later. Even when the information is present, the maintenance programmer doesn't know where to look for it. It is almost as common to find the same topic is covered twice, but that the statements in the documentation are inconsistent with each other and the code.
Documentation is not an "attractive" research topic. Last year, I suggested to the leader of an Esprit project who was looking for a topic for a conference, that he focus on documentation. His answer was that it would not be interesting. I objected, saying that there were many interesting aspects to this topic. His response was that the problem was not that the discussions wouldn't be interesting, the topic wouldn't sound interesting and would not attract an audience.
For the past five or six years my own research, and that of many of my students and close colleagues, has focused on the problems of documentation. We have shown how mathematical methods can be used to provide clear, concise, and systematic documentation of program design. We have invented and illustrated new mathematical notation that is much more suited to use in documentation, but no less formal. The reaction of academics and practitioners to this work has been insight-provoking. Both sides fail to recognize documentation as the subject of our work. Academics keep pointing out that we are neglecting "proof obligations"; industrial reviewers classify our work as "verification" which they (often correctly) consider too difficult and theoretical. Neither group can see documentation as an easier, and in some sense more important topic than verification. To them, documentation is that "blah blah" that you have to write. In fact, unless we can solve the documentation problem, the verification work will be a waste of time.
In talking to people developing commercial software we find that documentation is neglected because it won't speed up the next release. Again, programmers and managers are so driven by the most imminent deadline, that they cannot plan for the software's old age. If we recognize that software aging is inevitable and expensive, that the first or next release of the program is not the end of its development, that the long-term costs are going to exceed the short term profit, we will start taking documentation more seriously.
When we start taking documentation more seriously, we will see that just as in other kinds of engineering documentation, software documentation must be based on mathematics. Each document will be a representation of one or more mathematical relations. The only practical way to record the information needed in a proper documentation will be to use formally defined notation.
29.8.2 Retroactive Documentation. A major step in slowing the aging of older software, and often rejuvenating it, is to upgrade the quality of the documentation. Often, documentation is neglected by the maintenance programmers because in their haste to correct problems reported by customers or to introduce new features demanded by the market. When they do document their work, it is often by means of a memo that is not integrated into the previously existing documentation, but simply added to it. If the software is really valuable, the resulting unstructured documentation can, and should, be replaced by carefully structured documentation that has been reviewed to be complete and correct. Often, when such a project is suggested, programmers (who are rarely enthusiastic about any form of documentation) scoff at the suggestion as impractical. Their interests are short-term interests, and their work satisfaction comes from running programs. Nonetheless, there are situations where it is in the owner's best interest to insist that the product be documented in a form that can serve as a reference for future maintenance programmers.
29.9.3 If It's Not Documented, It's Not Done. If a product is not documented as it is designed, using documentation as a design medium, we will save a little today, but pay far more in the future. It is far harder to recreate design documentation than to create it as we go along. Documentation that has been created after the design is done, and the product shipped, is usually not very accurate. Further, such documentation was not available when (and if) the design was reviewed before coding. As a result, even if the documentation is as good as it would have been, it has cost more and been worth less.
The Maintenance Adjustment Factor (MAF), Equation 2.10, is used to adjust the effective maintenance size to account for software understanding and unfamiliarity effects, as with reuse. COCOMO II uses the Software Understanding (SU) and Programmer Unfamiliarity (UNFM) factors from its reuse model to model the effects of well or poorly structured/understandable software on maintenance effort.
Table 2.5 Rating Scale for Software Understanding Increment (SU)
|Very low cohesion, high coupling, spaghetti code.
|No match between program and application world-views.
|Obscure code; documentation missing, obscure, or obsolete.
|Moderately low cohesion, high coupling.
|Some correlation between program and application.
|Some code commentary and headers; some useful documentation.
|Reasonably well-structured; some weak areas.
|Moderate correlation between program and application.
|Moderate level of code commentary, headers, documentation.
|High cohesion, low coupling.
|Good correlation between program and application.
|Good code commentary and headers; useful documentation; some weak areas
|Strong modularity, information hiding in data/control structures.
|Clear match between program and application world-views.
|Self-descriptive code; documentation up-to-date, well organized, with design rationale.
I used to pass by a large computer system with the feeling that it represented the summed-up knowledge of human beings. It reassured me to think of all those programs as a kind of library in which our understanding of the world was recorded in intricate and exquisite detail. I managed to hold onto this comforting belief even in the face of 20 years in the programming business, where I learned from the beginning what a hard time we programmers have in maintaining our own code, let alone understanding programs written and modified over years by untold numbers of other programmers. Programmers come and go; the core group that once understood the issues has written its code and moved on; new programmers have come, left their bit of understanding in the code and moved on in turn. Eventually, no one individual or group knows the full range of the problem behind the program, the solutions we chose, the ones we rejected and why.
Over time, the only representation of the original knowledge becomes the code itself, which by now is something we can run but not exactly understand. It has become a process, something we can operate but no longer rethink deeply. Even if you have the source code in front of you, there are limits to what a human reader can absorb from thousands of lines of text designed primarily to function, not to convey meaning.
The underlying problem is obvious. Get past the flashy graphics and fancy user interfaces, and you frequently descend into a nightmarish realm of twisted spaghetti-like code that might better belong in a Salvador Dali painting. One recurring type of software security bug, buffer overflows, dates back to the dawn of computing. ... A plethora of patches will never be a substitute for true quality software.
A BIG BALL OF MUD is haphazardly structured, sprawling, sloppy, duct-tape and bailing wire, spaghetti code jungle. We've all seen them. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated. The overall structure of the system may never have been well defined. If it was, it may have eroded beyond recognition.
What does this muddy code look like to the programmers in the trenches who must confront it? Data structures may be haphazardly constructed, or even next to non-existent. Everything talks to everything else. Every shred of important state data may be global. Where state information is compartmentalized, it may be passed promiscuously about though Byzantine back channels that circumvent the system's original structure.
Variable and function names might be uninformative, or even misleading. Functions themselves may make extensive use of global variables, as well as long lists of poorly defined parameters. The function themselves are lengthy and convoluted, and perform several unrelated tasks. Code is duplicated. The flow of control is hard to understand, and difficult to follow. The programmer's intent is next to impossible to discern. The code is simply unreadable, and borders on indecipherable. The code exhibits the unmistakable signs of patch after patch at the hands of multiple maintainers, each of whom barely understood the consequences of what he or she was doing. Did we mention documentation? What documentation?
In examining the tasks of software development versus software maintenance, most of the tasks are the same -- except for the additional maintenance task of "understanding the existing product." This task consumes roughly 30 percent of total maintenance time and is the dominant maintenance activity. Thus it is possible to claim that maintenance is a more difficult task than development.
We resolve to keep all program design documentation complete, precise, and up to date.