Source code documentation is a fundamental engineering practice critical to efficient software development. Regardless of the intent of its author, all source code is eventually reused, either directly, or just through the basic need to understand it. In either case, the source code documentation acts as a specification of behavior for other engineers. Without documentation, they are forced to get the information they need by making dangerous assumptions, scrutinizing the implementation, or interrogating the author. These alternatives are unacceptable. Although some developers believe that source code "self-documents", there is a great deal of information about code behavior that simply cannot be expressed in source code, but requires the power and flexibility of natural language to state. Consequently, source code documentation is an irreplaceable necessity, as well as an important discipline to increase development efficiency and quality.
In my opinion, there is nothing in the programming field more despicable than an uncommented program. A programmer can be forgiven many sins and flights of fancy, including those listed in the sections below; however no programmer, no matter how wise, no matter how experienced, no matter how hard-pressed for time, no matter how well-intentioned, should be forgiven an uncommented and undocumented program.
Of course, it is important to point out that comments are not an end unto themselves. As Kernighan and Plauger point out in their excellent book, The Elements of Program Style, good comments cannot substitute for bad code. However, it is not clear that good code can substitute for comments. That is, I do not agree that it is unnecessary for comments to accompany "good" code. The code obviously tells us what the program is doing, but the comments are often necessary for us to understand why the programmer has used those particular instructions.
Common sense also leads us to the recognition of the characteristics of programs that makes the programs maintainable. Above all, we look for programs that exhibit logical simplicity -- failing that, at least clarity. The earmarks of simplicity and clarity include modularity (true functional modularity, not arbitrary segmentation) and a hierarchical control structure, restrictions on each module's access to data, structured data forms, the use of structured control forms, and generous and accurate annotation.
Much has been said of the technical members of this set in earlier pages. Of good annotation, there are several features that must be included. First, the header information of each procedure should provide a concise statement of the procedure's external specifications, including a description of input and output data. Each section of the procedure should be introduced by comments identifying the section's relation to the external characteristics. Finally, comments within each section should relate groups of statements to the program's documented description. This last is automatically achieved by using design language statements as source code comments.
Software must be understandable to two different types of entities for two different purposes. First, compilers or interpreters must be able to translate source code into machine instructions. Second, people need to understand the software so they can further develop, maintain and utilize the application. The average developer overemphasizes capability and function while undervaluing the human understanding that effects improved development and continued utilization. There should be a description in clear view within the programming medium.
As I gradually improved my in-code documentation, I realized that English is a natural language, but computer languages, regardless of how well we use them, are still "code." Communication via natural language is a relatively quick and efficient process. Not so with computer languages: They must be "decoded" for efficient human understanding.
People who read my code? wait a moment - did I say "read my code?" Now that's a remarkable way to approach software - not to debug, analyze, program, or develop, but simply to read. The act of reading allows me to approach my code as a work of software art: I strive to make the overall design, algorithm, structure, documentation and style as simple, elegant, through and effective as practical. Yes, this takes time, but when I'm rushed, I usually dash off the wrong implementation of the wrong design, and the darn project takes twice as long as it would have had I done it right in the first place. A disciplined, focused approach clarifies my thinking and improves my implementation. In keeping with a reasonable attempt for excellence, I proofread my applications.
My goal is to find a balance, describing all salient program features comprehensively but concisely. I explained each software component's purpose, the algorithm used, arguments, inputs, outputs- even the reason for #include-ing a particular header file. I document each section of each function so that the overall program flow is readily understandable.
My article seems to have generated quite a bit of controversy. The article's text received only praise. This implies that the goal of "understandable code" is well-nigh universal. How best to achieve it seems to be a matter of highly polarized opinion. Even for the wealth of comments that I customarily provide, some readers chided me for not having enough! Other readers believe that all comments are superfluous and cause trouble by their very existence. I've seen horrendous maintenance problems incurred with this approach. Some readers believe that Design by Contract, coupled with lengthy function and variable names, provides all the necessary documentation.
My experience is that well-developed modular designs, coupled with good system documentation, descriptive identifier names and a natural-language narrative, result in code that's a pleasure to work with and efficient to maintain.
The essence of pretty code? One can infer much about its structure from a glance, without completely reading it. I call this visual parsing: discerning the flow and relative importance of code from its shape.
Program documentation has been propelled into importance by sheer necessity. However, it still suffers from glowing tributes but inept implementations. One of the basic elements of good program documentation is an effective program listing.
A program is in some sense a permanent object in that it can have a long lifetime. For the future reader, comments in a program should be truly substantive. They should say something. They should assist the program reader.
The professional thinks of a comment as a way to proceed from one point (a given state of knowledge) to another (understanding what is written in the program). The comment is a bridge. The professional assumes something about the reader of the program--the reader being, of course, someone else. It is fair to assume that the reader knows the language in which the program is written. The reader's difficulty is to modify the program at hand.
These observations lead to some specific recommendations. First, regarding the idea of comments as a bridge--Extensive introductory program comments are entirely in order. These comments set the stage for reading the program. They may contain an outline of the solution adopted by the programmer, summarize its input and output, give a directory of key variable names, or describe an algorithm that may not be known to the reader. Such comments provide a direct bridge from the problem to the program. They do not intrude on the reading of the program itself because they appear at the beginning of the program and can be read or not as the reader desires.
A second recommendation has to do with procedures and other major units of the program--Introductory module comments are also in order. Comments following a procedure header explaining the general nature of the procedure are not only in order but may be necessary. Keeping in mind the bridge aspect, we need not describe the calling environment. The professional assumes that the reader has read the program to the degree that the procedure calls are understood--but maybe not the procedure itself. As such, the procedure header comments should be short and help the reader understand the next level of detail in the program.
Third, the professional should spend the most energy on the code itself. This means--Avoid embedded (in-line) comments within the body of the module itself. It is my view that such comments can readily intrude upon the meaning of a program. Ideally, the code should speak for itself and require few supporting comments.
Documentation that is structured and contained within the program is able to immediately satisfy the changing information demands of the maintainer. Those needs are determined in part by the subtask on which he is currently working. For solving nontrivial error correction or modification problems, the maintainer must have a detailed understanding of the program. To locate a section of code, knowledge of the program's structure is required. Knowing how an instruction sequence relates to other parts of the program is important for altering and testing software. The documentor can inform the unknowledgeable programmer in each subtask demand by varying the message content and effectively using the visual space.
Information may be conveyed to the maintainer in several ways. One is an abstract summary of the module at the beginning of the routine. Another is through the titles and headings of processing sections positioned in the instruction sequence. The third is in phrases and short sentences to the right of the code. They describe the processing steps and relate them to other parts of the program. The descriptions are organized into an outline that reflects the processing divisions of the routine.
The size and complexity of the module determine whether the information will be used. Small routines may need only comments to the right of the code. A more complete description is required for large programs.
The type of documentation that has just been read has a bearing on the processing of code. Documentation formats act as advance organizers of thought. Each type primes the maintainer for a different response to the instructions encountered. Messages that are consistent with the structure of the program aid recognition and recall. ... Programs are documented to enhance the maintainer's performance.
Program comments within and between modules and procedures usually convey information about the program, such as the functionality, design decisions, assumptions, declarations, algorithms, nature of input and output data, and reminder notes. Considering that the program source code may be the only way of obtaining information about a program, it is important that the programmers should accurately record useful information about these facets of the program and update them as the system changes. Common types of comments used are prologue comments and in-line comments. Prologue comments precede a program or module and describe goals. In-line comments, within the program code, describe how these goals are achieved.
The comments provide information that the understander can use to build a mental representation of the target program. For example, in Brooks' top-down model, comments - which act as beacons - help the programmer not only form hypothesis, but refine them to closer representations of the program. Thus, theoretically there is a strong case for commenting programs. The importance of comments is further strengthened by evidence that the lack of good comments in programs constitutes one of the main problems that programmers encounter when maintaining programs. It has to be pointed out that comments in programs can be useful only if they provide additional information. In other words, it is the quality of the comment that is important, not its presence or absence.
Though programmers are often encouraged to comment their source code more thoroughly, there has been very little scientific investigation into what kinds of situations actually cause programmers to do so. I conducted a statistical study of the CVS repositories of nine Open Source projects, and made four major findings. First, the rate at which programmers comment varies widely from project to project and programmer to programmer; even the same programmer will comment at different rates on different projects. Second, programmers tend to comment larger modifications to source code more thoroughly. Third, more programmers modifying the same file does not, in general, mean more commenting. Finally, programmers tend to comment more when they are modifying code that is thoroughly commented to begin with. I then determined through an experiment with programmers that there is a causal link behind my last finding; that is, the more throughly a source code file is commented, the more thoroughly programmers will comment when they make modifications to it.
Here are some attributes of great code: