On documenting code
November 2, 2016
When I write code, I try to make it clear what it does without needing comments; I try to use meaningful function and variable names, not make functions too big or too small, try not to write anything too ‘clever’, etc. Recently I’ve began to feel that’s not really enough. I feel that I can make improvements to make it clearer to others what is going on in my code, and most importantly, to me when I have to fix a bug six months later.
Let’s take a look at a couple of popular open source projects and see how they document their code. Just to make sure nobody gets offended I should note that these are just random examples, I’m not picking on anyone or any project in particular. I’ve specifically looked for shorter functions as examples, as it’s easier to see and explain what’s going on.
Example 1
In the Ruby community it’s popular for code to be self-documenting and to use tests as documentation, as the language itself is quite clean and testing is ingrained into the community. Ruby is the main language I’ve used throughout my career, so my way of writing and documenting code is pretty similar to this.
def checksums
checksums = {}
lines(versions_path).each do |line|
name, _, checksum = line.split(" ", 3)
checksums[name] = checksum
end
checksums
end
This is from Bundler,
the most popular dependency management tool for Ruby. This returns a hash where
the value is some sort of checksum, and the key, called name
, is - I assume -
the name of a Ruby Gem/library. I’m also assuming versions_path
is a
filename, and lines
returns a subset of lines from that file. I don’t know
what the 2nd entry of the line contains, and why it is ignored.
Without knowing what the class as a whole does (for which there also isn’t any documentation), you can’t know exactly what this does. Even looking around the code that uses this, it’s not exactly clear what it does and what edge cases there are. This isn’t meant to be a public API, but that’s not really the point.
This could be improved with a short sentence explaining what this method does, how it should be used, what class variables it uses, what it returns and why.
Example 2
def attribute_names
@attribute_names ||= if !abstract_class? && table_exists?
attribute_types.keys
else
[]
end
end
This is from
Ruby on Rails,
specifically the ORM ActiveRecord. Just looking at the code, it’s about on-par
with the previous example. In Rails it’s standard for functions suffixed with a
question mark to return booleans, so it’s pretty clear what abstract_class?
and table_exists?
return. attribute_types
isn’t so clear - it must act
like a hash seeing how it’s used, but it’s not so clear what it contains or
what this would return without knowing the domain-logic (in ActiveRecord
‘attribute’ means a database column). There isn’t anything too
complicated going on, but it’s not entirely clear without looking around the
class.
# Returns an array of column names as strings if it's not an abstract class and
# table exists. Otherwise it returns an empty array.
#
# class Person < ActiveRecord::Base
# end
#
# Person.attribute_names
# # => ["id", "created_at", "updated_at", "name", "age"]
def attribute_names
...
end
But this has documentation! The comment explains what it does and the edge cases. The example enforces what it does and makes it clear exactly what it returns. It doesn’t explain the why however, i.e. why does this return an empty array for an abstract class? I should note that this is a public API, and Ruby on Rails uses the code to automatically generate documentation.
Example 3
// IsStarred checks if a repository is starred by authenticated user.
//
// GitHub API docs: https://developer.github.com/v3/activity/starring/#check-if-you-are-starring-a-repository
func (s *ActivityService) IsStarred(owner, repo string) (bool, *Response, error) {
u := fmt.Sprintf("user/starred/%v/%v", owner, repo)
req, err := s.client.NewRequest("GET", u, nil)
if err != nil {
return false, nil, err
}
resp, err := s.client.Do(req, nil)
starred, err := parseBoolResponse(err)
return starred, resp, err
}
This is Go, it’s from
go-github,
a client for the GitHub API. I feel that Go has a bit of an advantage over Ruby
is terms of legibility, it has types in arguments for starters and the built-in
go fmt
tool really help enforce coding standards. On the other hands it is
popular to use single character variable names (u
here is the path of a URL)
and the multiple return values in this example aren’t entirely clear (although the
language does support named return values).
The documentation of this could be improved, for example explaining clearly
what each return value is and why (I assume *Response
is returned so you can
log the HTTP response in case of an error, but is there any other use?). It’s
also not clear what sort of errors could be returned without digging into
NewRequest
and parseBoolResponse
.
Example 4
now_str(Now) ->
{{Year, Month, Day},{Hour, Minute, Second}} = Now,
Formatted = io_lib:format("~4.10.0B-~2.10.0B-~2.10.0BT~2.10.0B:~2.10.0B:~2.10.0BZ", [Year, Month, Day, Hour, Minute, Second]),
lists:flatten(Formatted).
This is some of the Erlang code for WatchSumo.
It’s a simple utility function which takes an Erlang timestamp (from
calendar:universal_time()
) and returns it in ISO 8601 format (e.g.
1994-11-05T13:15:30Z
).
It’s pretty clear it converts some sort of timestamp, but unless you a master of
Erlang format strings, the only way you’ll know what format it returns is by
running it. Even if the function had a clearer name like
calendar_universal_time_to_iso8601
, I’d still have to Google “iso8601” to
know what format it returns. In this case, the best way to document it would be with
an example.
More examples
Here are some projects which are examples of good documentation:
(If you have any more good examples, send them in, and I’ll update the list)
Takeaways
So what can we learn from this? Self-documenting code usually explains the “what”, but it doesn’t explain the “why”. Usually I add a comment explaining why something happens if it feels a bit weird, but I don’t think that’s enough. What I feel weird probably doesn’t match what someone else feel is weird, and these examples show that.
Here are some guidelines I want to follow in relation to commenting code:
- The class/module/file should be commented explaining at a high level what it does and why.
- The function should be commented explaining briefly what it does, but mainly why. Complicated logic and public APIs should go into more detail.
- Arguments should be commented explaining what they are and what they are used for (especially when using a weakly typed language).
- Return values should be commented explaining what they are and how they can be used. Take care when returning values from other functions to ensure they are clear (like errors in the third example).
- Any variables used or changed outside the scope of the function should be commented (but prefer writing functional code and avoid global state).
- Examples of usage should be included where it will help make it clearer what the code is for, especially public APIs.
- Link to other documentation with a “see also” where you will just be repeating what is in other comments (without too much indirection).
- Keep your personal opinions/rants/politics out of comments (see XEE).
Be careful though, religiously writing comments just for the sake of writing comments is just as bad. Comments should be added to make code clearer. If with comments it’s still not clear, maybe you just to rewrite the code.
Further reading
- Hacker News comments on Brad Touesnard’s article linked above (2014)
- Code Tells You How, Comments Tell You Why by Jeff Atwood (2006)