1111 rules for avoiding misery in bioinformatics

or a bunch of stupid things I've done

or an excuse to spout my opinion and act smug

Created by Pieter Moris

Use version control

Preferably git


  • Safety
  • History
  • Organization


Beware git GUIs

Use the command line as much as possible for interacting with git and only rely on a GUI for visualisation of history and file diffs.

Reason: GUIs sometimes hide steps, use slightly different naming conventions or try to be smart and automate certain steps, which can cause a lot of confusion, anxiety and/or horror.

Or they just choke completely after a git rebase -i.

Read more in the primer, important aspects include:

  • How to generate SSH keys and how to use them in other git-aware tools.
  • Use a private commit e-mail address and don't accidentally overwrite it.
  • git clone is just a way to download a repository to your computer.

These are not the same!


GitHub recommends HTTPS, but SSH allows you to use your SSH key instead of requiring login credentials for each remote action.

Side note for those that use GitKraken:

  • GitKraken account != GitHub account != git username
  • Profile overwrites your git e-mail git config --list

Be careful when using Excel or equivalents

Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics (Zeeberg et al., 2004. https://doi.org/10.1186/1471-2105-5-80)
Gene name errors are widespread in the scientific literature (Ziemann et al., 2016. https://doi.org/10.1186/s13059-016-1044-7)

Other lessons to learn: encapsulate fields ("entry-filled with, separators you use to separate; columns") and mind character encoding (almost always opt for UTF-8).

Similar issues can crop up when importing data into R or pandas dataframes, so be explicit and specify the encoding!

Familiarise yourself with hotkeys and shortcuts ⌨

Seeing other people navigate their computer sometimes triggers a deep silent rage within me.


Notorious examples include:

  • Not using tab-completion whenever possible (and mistyping file paths as a consequence).
  • Re-typing commands in a terminal instead of using the history ( key or history search function ctrl + r).
  • Using the mouse and GUI buttons to run code in notebooks or IDEs.
Spend 10 minutes now to save many more in the future.

Especially important in your terminal, file manager and IDE or text editor.

Additional examples and resources

  • Your terminal has dozens of neat shortcuts, like moving the cursor to the start/end of a line, deleting whole words, etc.
  • R and Jupyter notebooks have hotkeys to create cells, switch between cell types, run the current/above/below/all cells
  • In RStudio select lines and use ctrl + return to run just them (or all lines).
  • Use automatic docstring generators. You are writing docstrings, aren't you?
  • Decide on a code formatter + linter and use it consistently, most editors can do this for you (e.g. VS code + black and flake8).

Check out the primer for more resources on this topic.

Be careful when copying code from the web ✂📋

								pieter  🐍 base  ~  sudo apt install –y cowsay
								[sudo] password for pieter:
								Reading package lists... Done
								Building dependency tree
								Reading state information... Done
								E: Unable to locate package –y

(em dash) VS (en dash) VS - (hyphen ~= hyphen-minus)

							  pieter  🐍 base  ~  echo “something profound”
							“something profound”
							 pieter  🐍 base  ~  echo "something else"
							something else

curly or smart and straight or vertical quotation marks

“ ” VS " "

‘ ’ VS ' '

Terminal tips

  • Increase the length of your bash history:
    echo "HISTFILESIZE=100000" >> ~/.bashrc and use ctrl + r to search through it.

  • Create aliases and save short scripts:
    echo alias vpn="sudo openconnect vpn.uantwerpen.be" >> ~/.bash_aliases

Be careful when using rm.

  • cd /so/very/deep/in/a/path && rm -rf dir
  • Place dangerous flags after the path: rm /path/to/dir -r. Accidentally pressing enter too soon is less likely to cause harm.
  • Tab completion helps!
  • touch -- -i in important directories.
  • Use https://launchpad.net/safe-rm
  • Don't use trailing slashes or dot-slash starts in your filepaths:
    rm -rf ./* # *; rm -rf $FOO/ # $FOO

What is the PATH?!‽‽‽

List of directories where the shell searches for executable files.

$ echo $PATH

echo 'export PATH="$PATH:/home/pieter/miniconda3/envs/tools/bin"' >> .bash_rc

You can use the following commands to find out where a command is located:

							 $ which python # only works if binary is included in $PATH
							 $ conda activate tools
							 $ which python

							 $ whereis python # searches hard-coded paths regardless of $PATH
							python: /usr/bin/python3.6 /usr/bin/python3.5 /usr/bin/python3.6m-config /usr/bin/python3.6m /usr/bin/python3.5m /usr/bin/python3.6-config /usr/bin/python2.7 /usr/bin/python /usr/lib/python3.6 /usr/lib/python3.5 /usr/lib/python2.7 /usr/lib/python3.7 /etc/python3.6 /etc/python3.5 /etc/python2.7 /etc/python /usr/local/lib/python3.6 /usr/local/lib/python3.5 /usr/local/lib/python2.7 /usr/include/python3.6 /usr/include/python3.6m /usr/include/python2.7 /usr/share/python /home/pieter/miniconda3/envs/tools/bin/python3.7 /home/pieter/miniconda3/envs/tools/bin/python3.7m /home/pieter/miniconda3/envs/tools/bin/python3.7m-config /home/pieter/miniconda3/envs/tools/bin/python3.7-config /home/pieter/miniconda3/envs/tools/bin/python /usr/share/man/man1/python.1.gz

							 $ echo alias vpn=openconnect vpn.uantwerpen.be >> .bash_aliases
							 $ type vpn # searches environment including $PATH and aliases (or "command -v")
							vpn is aliased to `sudo openconnect vpn.uantwerpen.be'

cf. version managers


Windows Command Prompt and Powershell are not the same as a Unix-shell, although they have their uses.

Install the Windows Subsystem for Linux on W10 https://docs.microsoft.com/en-us/windows/wsl/install-win10, rather than running a virtual machine (for most scenarios).

Caveat: things like Python are not automatically shared between Windows and the Subsystem.

Even GUI apps work if you install an X server!

Can even be linked to your IDE e.g. VS Code: https://medium.com/@janelgbrandon/a-guide-for-using-wsl-for-development-d135670313a6 & https://code.visualstudio.com/docs/remote/wsl as if it's a remote server

For the full write-up, see the primer.


Learn how to exit vim

Press the X button
$ killall -9 vim


Learn (at least) a few Bash/PowerShell tools

grep (and by extension regular expressions) can help when your IDE decides to be an idiot.


Spend some time on a bash primer

								 $ rsync --help | grep "-z"
								Usage: grep [OPTION]... PATTERN [FILE]...
								Try 'grep --help' for more information.

								 $ rsync -h | grep "\-z"
								-z, --compress              compress file data during the transfer

								 $ rsync -h | grep '\-z'
								-z, --compress              compress file data during the transfer

								 $ rsync --help | grep -- -z
								-z, --compress              compress file data during the transfer


Things worth your time

SSH tips 🔑

Use ~/.ssh/config

								Host calcua
								HostName login-leibniz.uantwerpen.be
								#Port 22000
								User vsc20380
								IdentityFile ~/.ssh/id_rsa
								$ ssh calcua
								$ rsync -avhzP *.tar.gz calcua:/scratch/antwerpen/203/vsc20380/pmc/

When working on a remote server over SSH (or PuTTY), you can keep jobs alive after disconnecting:

  • After already starting a process: ^Z followed by bg %1 and disown (-h) %1
  • nohup bash my_amazing_script.sh > output.log 2>&1 & (cf. redirection)
  • screen / tmux

Process needs to be immune to SIGHUP.

If you cannot connect to a server using SSH keys (and ssh -vvv is cryptic), check your file permissions.

  • .ssh directory: 700 (drwx------)
  • public key (.pub file): 644 (-rw-r--r--)
  • private key (id_rsa): 600 (-rw-------)
  • home directory should not be writeable by the group or others, at most 755 (drwxr-xr-x)

Windows users: be aware than when you create keys in WSL the keys won't automatically be picked up by other Windows programs (PuTTY, git GUIs, etc.)


When you have more than a few print() statements in your Python code, it's time to use a proper debugger.

							import pdb # or import ipdb as pdb

							# insert this line where you want a breakpoint

Or use a solution built into your IDE e.g. PyCharm.

Managing Python


Use conda

  • Actually, use miniconda to avoid bloat.
  • Lives side-by-side your system-wide Python install, but uses an alias to become the default which python
  • Use conda environments to keep projects and specific package versions separated and documented cf. version control.
  • Use virtualenv when deploying code on a remote server or docker container.

Check the documentation for a list of command commands: creating/listing/exporting environments, installing/removing packages, etc. https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html

conda and pip are separate things

Use conda install by default.

If a package is missing, check out different channels e.g. conda install --channel forge (bioconda is also nice).

If that fails, first install pip inside the conda environment.

cf. What is PATH?

Never use sudo conda/pip install!

You should always be able to install packages for just your user. This is more secure and ensures all packages can see one another.

pip is not installed by default!

							 $ conda create -n newenv python=3.7
							 $ which python
							 $ which pip

							 $ conda create -n newerenv
							 $ conda activate newerenv
							 $ which python
							 $ which pip
							# due to custom PATH append statement inside my .bashrc

Install nb_conda for managing environments inside notebooks


Add nb_conda to whatever environment you want to launch Jupyter notebooks from (I use the base environment).

Install ipykernel / r-irkernel into any environment you want to use inside notebooks.

For ruby use rbenv.

For node.js use nvm.

Use pathlib to mange i/o.


								> here("data","raw","input.dat")
								[1] "/home/pieter/projects/dot-project-directory/data/raw/input.dat"

Don't preserve your workspace, since previously stored variables can silently mess up your results.


For package management, check out renv. (or packrat)

Don't install packages via your system package manager, only use install.packages().

conda can also manage R environments, but might not act nicely with R_LIBS (https://waoverholt.com/conda-and-R/)

renv: if things go awry, install package into the base environment, purge the renv files (`renv` dir, `renv.lock` and a line in `.RProfile`) and start over.

The default R installation directory might not be writeable for a user, so a user library should be created (rather than installing packages as an admin).


							Installing package into ‘/usr/local/lib/R/site-library’
							(as ‘lib’ is unspecified)
							Warning in install.packages("random") :
							'lib = "/usr/local/lib/R/site-library"' is not writable

							Would you like to use a personal library instead?  (y/n) y

							Would you like to create a personal library
							to install packages into?  (y/n) y

You can also set this in a file:

							echo R_LIBS_USER="~/lib/R/library" > .Renviron

Use interactive coding notebooks responsibly 📓

Notebooks (Jupyter and R) are great for:

  • Exploration
  • Prototyping
  • Sharing results, visualisation and adding remarks

But notebooks suffer from:

  • Hidden states (see example)
  • Out-of-order execution
  • Horrible for sharing code

Use them appropriately, for exploration, not reproducing.

Write functions in separate files or create a local package (see primer).

As a side note, everyone who uses Jupyter Notebooks should check out its successor JupyterLab and Binder-like cloud solutions. (https://www.dataschool.io/cloud-services-for-jupyter-notebook/)

							    ,@@@@@@@@@@,,@@@@@@@%  .#&@@@&&.,@@@@@@@@@@,      %@@@@@@%*   ,@@@%     .#&@@@&&.  *&@@@@&(  ,@@@@@@@%  %@@@@@,     ,@@,
								,@@,    ,@@,      ,@@/   ./.    ,@@,          %@%   ,&@# .&@&@@(   .@@/   ./. #@&.  .,/  ,@@,       %@%  *&@&.  ,@@,
								,@@,    ,@@&%%%%. .&@@/,        ,@@,          %@%   ,&@# %@& /@@,  .&@@/,     (@@&%(*.   ,@@&%%%%.  %@%    &@#  ,@@,
								,@@,    ,@@/,,,,    ./#&@@@(    ,@@,          %@@@@@@%* /@@,  #@&.   ./#&@@@(   *(%&@@&. ,@@/,,,,   %@%    &@#  .&&.
								,@@,    ,@@,      ./,   .&@#    ,@@,          %@%      ,@@@@@@@@@% ./.   .&@# /*.   /@@. ,@@,       %@%  *&@&.   ,,
								,@@,    ,@@@@@@@% .#&@@@@&/     ,@@,          %@%     .&@#     ,@@/.#&@@@@&/   /%&@@@@.  ,@@@@@@@%  %@@@@@.     ,@@,

Try to incorporate unit testing into your workflow.

Even something as simple as an assert statement checking the number of columns after reading a .csv file goes a long way.

For more information, check out this talk aimed specifically at data scientists (as opposed to software engineers): https://www.youtube.com/watch?v=0ysyWk-ox-8

When sharing files between Windows and Unix systems, use the dos2unix / unix2dos command to convert line breaks ("\r\n" <=> "\n").

Structure your projects and your code

Go through Wout's presentations and/or the primer.

Add docstrings to your functions.

Create functions.

Be consistent in code style (cf. use a formatter).

Decide on a sane project directory structure. (e.g. https://drivendata.github.io/cookiecutter-data-science/, https://community.rstudio.com/t/data-science-project-template-for-r/3230, https://www.r-bloggers.com/structuring-r-projects/, https://nicercode.github.io/blog/2013-04-05-projects/)

For all one-off commands, download sources and general meta-data, create READMEs in the relevant directories.

For repeated pipelines, use a workflow manager (Make is fine, but more feature-rich variants exist). (e.g. snakemake and nextflow, discussion here: https://www.biostars.org/p/258436/)

Python imports are weird.

My prefered approach is to use a locally installed package.

Store all your functions in a src directory and pip install -e . it using a setup.py in the project root directory.

More reading material on this:

  • https://dev.to/codemouse92/dead-simple-python-project-structure-and-imports-38c6
  • https://towardsdatascience.com/whats-init-for-me-d70a312da583
  • https://towardsdatascience.com/building-package-for-machine-learning-project-in-python-3fc16f541693
  • https://realpython.com/absolute-vs-relative-python-imports/
  • https://chrisyeh96.github.io/2017/08/08/definitive-guide-python-imports.html
  • https://alex.dzyoba.com/blog/python-import/
  • https://hackernoon.com/pip-install-abra-cadabra-or-python-packages-for-beginners-33a989834975
  • https://stackoverflow.com/questions/22840671/what-is-the-difference-between-importing-python-sub-modules-from-numpy-matplotl
  • https://stackoverflow.com/questions/16475129/clean-name-space-and-init-py
  • https://stackoverflow.com/questions/13093665/python-import-statement-semantics
  • https://stackoverflow.com/questions/19989179/modules-expose-imported-packages

Just use a password manager already

And Zotero for managing your literature

That's all folks.