The seven deadly sins of software documentation

In life science, learning to code is becoming as important as generating the data we have to manage, with a variety of online courses making it easier than ever to learn how to write computer programs. However, while learning the basics, it’s important to make sure that you pick up good habits along the way.

After chatting to EI’s Alice Minotto, who was complaining about a lack of proper documentation (written text that explains how to use a piece of software), we asked her to shed some light on what makes a piece of software readable - and what to avoid if you don’t want the rest of the coding world to wonder what on earth your program is for in the first place (and be annoyed, slightly, for an afternoon at least).

Alice works on CyVerse UK, making bioinformatics tools and resources available to researchers in the UK and beyond. Her role involves integrating various programs across multiple platforms, therefore good documentation makes her life much easier.

The satisfied face of a programmer reading some decent documentation.

1. Make it understandable.

Not helpful: “Our software relies on *weirdword*, *wizardry* and *randomcharword*. You will also have to configure *moremagichere* to run *neverheardofthis*.”

More helpful: “Our software relies on *weirdword_with_URL* to <purpose_of_weirdword>, *wizardry_with_URL* as a database and *randomcharword_with_URL* to <provide_some_functionality>. You may want to configure *more_magic_here_with_URL*, if you wish to run *neverheardofthis_with_URL* to <why you may want to use neverheardofthis>.”

No-one knows everything. An average person that reads your documentation should be able to understand 80% of the words.

2. Assume it's not obvious.

When you have expertise in something, it’s easy to forget that others might not find it as easy as you to automatically know what they are doing.

Even if something feels like it should be obvious, assume that it is not, especially if you have been writing software for ten years. The person reading it might have been coding for only a few months, might not use English as their first language, or have the confidence to ask questions about your software.

Not helpful: “... so now you just have to install/configure/define this.”

More helpful: “... so now you have to install/configure/define this as follows …”

3. I do not have psychic abilities.

Aside from Mystic Meg, none of us were born a mind reader.

If, during an upgrade, you change the way the program is installed, works, or is configured, then let me know; no small messages or emails to your coworkers, no telepathy. Put a BIG warning!

There was one occasion on which I went to a website and there wasn’t any information about what the software was actually used for at all.

4. Detail the error.

I run something and there’s a problem. The error message reads:

“Weird error.”

(True story).

How on earth am I supposed to know who or what to ask about this error? Make your error messages easily readable, contain the relevant information to raise an issue with the developers, and try to put a link to relevant documentation in the error.

More helpful: “Error: Something happened that we didn’t expect. You are seeing this problem because of x. Please refer to documentation y to resolve this issue. If you still have problems, please let us know by raising an issue here: Link to z”.

5. Maintain your software.

Not helpful: after hours and hours spent trying to understand what I’m doing wrong, by searching on the internet I find a bug report in a very obscure place that means I’m not doing anything wrong and I can’t solve my problem because the maintenance team has ignored it.

The bug report is on the official project, it’s two years old, and it hasn’t been assigned to anyone to fix by any particular time.

More helpful: if there’s an issue, sort it out (which might be hard if you’re the only developer on an open source project, but there are ways to make it easier for someone to sort it out - such as pull requests), or if that’s not possible, make it obvious that there’s an issue in the first place.

6. Make your code descriptive.

Not helpful: ‘tfrs’ is not an acceptable variable (a descriptively named container that labels and stores data in memory) name. Something a computer science professor probably would explain on the very first lecture!

Infatti è buona norma chiamare una variabile con un nome che ci faccia già identificare ciò che rappresenta, non è il momento di risparmiare sui caratteri! Non solo sarà di aiuto ai vostri colleghi, ma tra un mese, o un anno, quando vorrete riutilizzare il vostro programma, ‘area_rettangolo’ vi sarà’ molto più chiaro rispetto ad ‘ar’.

Un altro immenso aiuto da dare agli altri (e a voi stessi) è commentare il vostro codice spiegandone le funzioni e l’utilizzo. I commenti sono righe di testo ignorate dal computer, ma incredibilmente utili per gli utenti. Chiaro, a meno che siano state scritte in un linguaggio non convenzionale, nessuno vuole imparare cinese, polacco, shwaili o klingon soltanto per leggere il vostro codice!

That was probably hard to understand if you don’t speak Italian.

Essentially, stick to using variable names that people will understand and that actually describe what the variable is for (which will make your life easier, too, months and years down the line). Clearly, ‘area_rectangle’ makes a lot more sense than ‘ar’.

Something that would also be helpful would be to leave comments that describe what is going on in your code; they’re not read by the computer but are incredibly useful for the users (probably you too).

You should also think about writing these in a conventionally spoken international language (probably English at the moment) if it is going to be used by lots of people internationally.

Be like this man, documenting his code.

7. If you can, avoid using github issues and pull requests as documentation.

It is sometimes annoying for these to be used instead of proper documentation, when in reality it’s just a list of issues and possible solutions. Why have you organised your data and your code so well, but then randomly ‘documented’ this across different websites with no recognisable logic?

More helpful: Proper dedicated areas for code documentation, like readthsdocs.io or github README.md markdown files, or automatic documentation generation from code comments such as doxygen, sphinx, and javadoc. Making people waste time finding information in cluttered lists rather than structured searchable information is not going to win you any favours.

Essentially, poor documentation means that it takes people a lot longer than should be necessary to read, work out and implement software. It’s like reading a scientific paper without the methods section and being expected to reproduce the experiments and results.

Documenting software properly is absolutely just as important as writing the scripts in the first place.

Oh, and; 80% of these issues happened in the last week.

The seven deadly sins of software documentation

1. Make it understandable.

2. Assume it's not obvious.

3. I do not have psychic abilities.

4. Detail the error.

5. Maintain your software.

6. Make your code descriptive.

7. If you can, avoid using github issues and pull requests as documentation.

Related reading.

Five reasons why computing isn’t as scary as you think

Mapping cellular dynamics with the lichen cell atlas

Pangenome annotation opens up a multiverse of genes

Integrating single-cell and spatial genomics across the tree of life

Every cell tells a story: single-cell analysis in forensic science

Examining the science of evidence-based policy

Why gene editing is vital to protect nature

Earlham Institute launches first CyVerse-UK hub for ‘big data’ analysis

ELIXIR-UK awarded strategic funding to support UK life sciences data community

Scientists look to biotechnology to improve crop resilience and nutritional value

Precision Breeding for plants signed into law

Starting point of DNA replication mystery solved

Starship discovery reveals new frontiers of fungal genome evolution

New sequenced genome sheds light on weed resistance

Government committed to unlocking precision breeding technology

Low-protein diets offer protective effect during bacterial infection