On documenting code

November 2, 2016

When I write code, I try to make it clear what it does without needing comments; I try to use meaningful function and variable names, not make functions too big or too small, try not to write anything too ‘clever’, etc. Recently I’ve began to feel that’s not really enough. I feel that I can make improvements to make it clearer to others what is going on in my code, and most importantly, to me when I have to fix a bug six months later.

Let’s take a look at a couple of popular open source projects and see how they document their code. Just to make sure nobody gets offended I should note that these are just random examples, I’m not picking on anyone or any project in particular. I’ve specifically looked for shorter functions as examples, as it’s easier to see and explain what’s going on.

Example 1

In the Ruby community it’s popular for code to be self-documenting and to use tests as documentation, as the language itself is quite clean and testing is ingrained into the community. Ruby is the main language I’ve used throughout my career, so my way of writing and documenting code is pretty similar to this.

def checksums
  checksums = {}

  lines(versions_path).each do |line|
    name, _, checksum = line.split(" ", 3)
    checksums[name] = checksum
  end

  checksums
end

This is from Bundler, the most popular dependency management tool for Ruby. This returns a hash where the value is some sort of checksum, and the key, called name, is - I assume - the name of a Ruby Gem/library. I’m also assuming versions_path is a filename, and lines returns a subset of lines from that file. I don’t know what the 2nd entry of the line contains, and why it is ignored.

Without knowing what the class as a whole does (for which there also isn’t any documentation), you can’t know exactly what this does. Even looking around the code that uses this, it’s not exactly clear what it does and what edge cases there are. This isn’t meant to be a public API, but that’s not really the point.

This could be improved with a short sentence explaining what this method does, how it should be used, what class variables it uses, what it returns and why.

Example 2

def attribute_names
  @attribute_names ||= if !abstract_class? && table_exists?
    attribute_types.keys
  else
    []
  end
end

This is from Ruby on Rails, specifically the ORM ActiveRecord. Just looking at the code, it’s about on-par with the previous example. In Rails it’s standard for functions suffixed with a question mark to return booleans, so it’s pretty clear what abstract_class? and table_exists? return. attribute_types isn’t so clear - it must act like a hash seeing how it’s used, but it’s not so clear what it contains or what this would return without knowing the domain-logic (in ActiveRecord ‘attribute’ means a database column). There isn’t anything too complicated going on, but it’s not entirely clear without looking around the class.

# Returns an array of column names as strings if it's not an abstract class and
# table exists. Otherwise it returns an empty array.
#
#   class Person < ActiveRecord::Base
#   end
#
#   Person.attribute_names
#   # => ["id", "created_at", "updated_at", "name", "age"]
def attribute_names
...
end

But this has documentation! The comment explains what it does and the edge cases. The example enforces what it does and makes it clear exactly what it returns. It doesn’t explain the why however, i.e. why does this return an empty array for an abstract class? I should note that this is a public API, and Ruby on Rails uses the code to automatically generate documentation.

Example 3

// IsStarred checks if a repository is starred by authenticated user.
//
// GitHub API docs: https://developer.github.com/v3/activity/starring/#check-if-you-are-starring-a-repository
func (s *ActivityService) IsStarred(owner, repo string) (bool, *Response, error) {
  u := fmt.Sprintf("user/starred/%v/%v", owner, repo)
  req, err := s.client.NewRequest("GET", u, nil)
  if err != nil {
    return false, nil, err
  }
  resp, err := s.client.Do(req, nil)
  starred, err := parseBoolResponse(err)
  return starred, resp, err
}

This is Go, it’s from go-github, a client for the GitHub API. I feel that Go has a bit of an advantage over Ruby is terms of legibility, it has types in arguments for starters and the built-in go fmt tool really help enforce coding standards. On the other hands it is popular to use single character variable names (u here is the path of a URL) and the multiple return values in this example aren’t entirely clear (although the language does support named return values).

The documentation of this could be improved, for example explaining clearly what each return value is and why (I assume *Response is returned so you can log the HTTP response in case of an error, but is there any other use?). It’s also not clear what sort of errors could be returned without digging into NewRequest and parseBoolResponse.

Example 4

now_str(Now) ->
    {{Year, Month, Day},{Hour, Minute, Second}} = Now,
    Formatted = io_lib:format("~4.10.0B-~2.10.0B-~2.10.0BT~2.10.0B:~2.10.0B:~2.10.0BZ", [Year, Month, Day, Hour, Minute, Second]),
    lists:flatten(Formatted).

This is some of the Erlang code for WatchSumo. It’s a simple utility function which takes an Erlang timestamp (from calendar:universal_time()) and returns it in ISO 8601 format (e.g. 1994-11-05T13:15:30Z).

It’s pretty clear it converts some sort of timestamp, but unless you a master of Erlang format strings, the only way you’ll know what format it returns is by running it. Even if the function had a clearer name like calendar_universal_time_to_iso8601, I’d still have to Google “iso8601” to know what format it returns. In this case, the best way to document it would be with an example.

More examples

Here are some projects which are examples of good documentation:

Minoca
TCL

(If you have any more good examples, send them in, and I’ll update the list)

Takeaways

So what can we learn from this? Self-documenting code usually explains the “what”, but it doesn’t explain the “why”. Usually I add a comment explaining why something happens if it feels a bit weird, but I don’t think that’s enough. What I feel weird probably doesn’t match what someone else feel is weird, and these examples show that.

Here are some guidelines I want to follow in relation to commenting code:

The class/module/file should be commented explaining at a high level what it does and why.
The function should be commented explaining briefly what it does, but mainly why. Complicated logic and public APIs should go into more detail.
Arguments should be commented explaining what they are and what they are used for (especially when using a weakly typed language).
Return values should be commented explaining what they are and how they can be used. Take care when returning values from other functions to ensure they are clear (like errors in the third example).
Any variables used or changed outside the scope of the function should be commented (but prefer writing functional code and avoid global state).
Examples of usage should be included where it will help make it clearer what the code is for, especially public APIs.
Link to other documentation with a “see also” where you will just be repeating what is in other comments (without too much indirection).
Keep your personal opinions/rants/politics out of comments (see XEE).

Be careful though, religiously writing comments just for the sake of writing comments is just as bad. Comments should be added to make code clearer. If with comments it’s still not clear, maybe you just to rewrite the code.