git
Why?
How?
Use the command line as much as possible for interacting with
git
and only rely on a GUI for visualisation of history and file diffs.
Reason: GUIs sometimes hide steps, use slightly different naming conventions or try to be smart and automate certain steps, which can cause a lot of confusion, anxiety and/or horror.
Or they just choke completely after a git rebase -i
.
Read more in the primer, important aspects include:
git clone
is just a way to download a repository to your computer.GitHub recommends HTTPS, but SSH allows you to use your SSH key instead of requiring login credentials for each remote action.
git config --list
Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics (Zeeberg et al., 2004. https://doi.org/10.1186/1471-2105-5-80)
Gene name errors are widespread in the scientific literature (Ziemann et al., 2016. https://doi.org/10.1186/s13059-016-1044-7)
Other lessons to learn: encapsulate fields ("entry-filled with, separators
you use to separate; columns"
) and mind character encoding (almost always opt for
UTF-8
).
Similar issues can crop up when importing data into R
or
pandas
dataframes, so be explicit and specify the encoding!
Seeing other people navigate their computer sometimes triggers a deep silent rage within me.
↑
key or
history search function ctrl + r
).Spend 10 minutes now to save many more in the future.
Especially important in your terminal, file manager and IDE or text editor.
ctrl + return
to run just them
(or all lines).black
and
flake8
).Check out the primer for more resources on this topic.
pieter 🐍 base ~ sudo apt install –y cowsay
[sudo] password for pieter:
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package –y
—
(em dash) VS –
(en dash) VS
-
(hyphen ~= hyphen-minus)
pieter 🐍 base ~ echo “something profound”
“something profound”
pieter 🐍 base ~ echo "something else"
something else
curly or smart and straight or vertical quotation marks
“ ”
VS " "
‘ ’
VS ' '
echo "HISTFILESIZE=100000" >>
~/.bashrc
and use ctrl + r
to search through it.echo alias vpn="sudo
openconnect vpn.uantwerpen.be" >> ~/.bash_aliases
rm
.cd /so/very/deep/in/a/path && rm -rf dir
rm /path/to/dir -r
. Accidentally
pressing enter too soon is less likely to cause harm.touch -- -i
in important directories.rm -rf ./* # *;
rm -rf $FOO/ # $FOO
List of directories where the shell searches for executable files.
$ echo $PATH
/home/pieter/.nvm/versions/node/v10.5.0/bin:/home/pieter/.rbenv/shims:/home/pieter/.rbenv/bin:/opt/pycharm/bin:/home/pieter/miniconda3/bin:/home/pieter/miniconda3/condabin:/home/pieter/bin:/home/pieter/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/usr/lib/jvm/java-10-oracle/bin:/usr/lib/jvm/java-10-oracle/db/bin:/usr/lib/jvm/java-11-openjdk-amd64/bin:/home/pieter/miniconda3/envs/tools/bin
echo 'export PATH="$PATH:/home/pieter/miniconda3/envs/tools/bin"' >> .bash_rc
You can use the following commands to find out where a command is located:
$ which python # only works if binary is included in $PATH
/home/pieter/miniconda3/bin/python
$ conda activate tools
$ which python
/home/pieter/miniconda3/envs/tools/bin/python
$ whereis python # searches hard-coded paths regardless of $PATH
python: /usr/bin/python3.6 /usr/bin/python3.5 /usr/bin/python3.6m-config /usr/bin/python3.6m /usr/bin/python3.5m /usr/bin/python3.6-config /usr/bin/python2.7 /usr/bin/python /usr/lib/python3.6 /usr/lib/python3.5 /usr/lib/python2.7 /usr/lib/python3.7 /etc/python3.6 /etc/python3.5 /etc/python2.7 /etc/python /usr/local/lib/python3.6 /usr/local/lib/python3.5 /usr/local/lib/python2.7 /usr/include/python3.6 /usr/include/python3.6m /usr/include/python2.7 /usr/share/python /home/pieter/miniconda3/envs/tools/bin/python3.7 /home/pieter/miniconda3/envs/tools/bin/python3.7m /home/pieter/miniconda3/envs/tools/bin/python3.7m-config /home/pieter/miniconda3/envs/tools/bin/python3.7-config /home/pieter/miniconda3/envs/tools/bin/python /usr/share/man/man1/python.1.gz
$ echo alias vpn=openconnect vpn.uantwerpen.be >> .bash_aliases
$ type vpn # searches environment including $PATH and aliases (or "command -v")
vpn is aliased to `sudo openconnect vpn.uantwerpen.be'
cf. version managers
Windows Command Prompt and Powershell are not the same as a Unix-shell, although they have their uses.
Install the Windows Subsystem for Linux on W10 https://docs.microsoft.com/en-us/windows/wsl/install-win10, rather than running a virtual machine (for most scenarios).
Caveat: things like Python are not automatically shared between Windows and the Subsystem.
Even GUI apps work if you install an X server!
Can even be linked to your IDE e.g. VS Code: https://medium.com/@janelgbrandon/a-guide-for-using-wsl-for-development-d135670313a6 & https://code.visualstudio.com/docs/remote/wsl as if it's a remote server
For the full write-up, see the primer.
https://askubuntu.com/questions/1051525/windows-subsystem-for-linux-wsl-what-cant-i-do-with-the-ubuntu-application-fPress the X button
$ killall -9 vim
<esc>:wq<enter>
Shift-Z-Z
https://stackoverflow.blog/2017/05/23/stack-overflow-helping-one-million-developers-exit-vim/
grep
(and by extension regular expressions) can help when your IDE decides to be an
idiot.
$ rsync --help | grep "-z"
Usage: grep [OPTION]... PATTERN [FILE]...
Try 'grep --help' for more information.
$ rsync -h | grep "\-z"
-z, --compress compress file data during the transfer
$ rsync -h | grep '\-z'
-z, --compress compress file data during the transfer
$ rsync --help | grep -- -z
-z, --compress compress file data during the transfer
foo >outfile1 2>&1 >outfile2
(see
SO)grep / sed / awk
: https://regexone.com/~/.ssh/config
Host calcua
HostName login-leibniz.uantwerpen.be
#Port 22000
User vsc20380
IdentityFile ~/.ssh/id_rsa
$ ssh calcua
$ rsync -avhzP *.tar.gz calcua:/scratch/antwerpen/203/vsc20380/pmc/
^Z
followed by bg
%1
and disown (-h) %1
nohup bash my_amazing_script.sh > output.log 2>&1 &
(cf.
redirection)screen
/
tmux
Process needs to be immune to SIGHUP.
If you cannot connect to a server using SSH keys (and ssh -vvv
is cryptic), check
your file permissions.
drwx------
)-rw-r--r--
)-rw-------
)drwxr-xr-x
)Windows users: be aware than when you create keys in WSL the keys won't automatically be picked up by other Windows programs (PuTTY, git GUIs, etc.)
print()
statements in your Python code, it's time to
use a proper debugger.
import pdb # or import ipdb as pdb
# insert this line where you want a breakpoint
pdb.set_trace()
Or use a solution built into your IDE e.g. PyCharm.
conda
miniconda
to avoid bloat.which python
conda
environments to keep
projects and specific package versions separated and documented cf. version control.virtualenv
when deploying code on a remote server or
docker container.Check the documentation for a list of command commands: creating/listing/exporting environments, installing/removing packages, etc. https://docs.conda.io/projects/conda/en/latest/user-guide/getting-started.html
conda
and pip
are separate thingsUse conda install
by default.
If a package is missing, check out different channels
e.g. conda install --channel forge
(bioconda is also nice).
If that fails, first install pip inside the conda environment.
cf. What is PATH
?
sudo conda/pip install
!You should always be able to install packages for just your user. This is more secure and ensures all packages can see one another.
pip
is not installed by default!
$ conda create -n newenv python=3.7
......
$ which python
/home/pieter/miniconda3/envs/newenv/bin/python
$ which pip
/home/pieter/miniconda3/envs/newenv/bin/pip
$ conda create -n newerenv
......
$ conda activate newerenv
$ which python
/usr/bin/python
$ which pip
/home/pieter/miniconda3/envs/tools/bin/pip
# due to custom PATH append statement inside my .bashrc
nb_conda
for managing environments inside notebooksAdd nb_conda
to whatever environment you want to launch Jupyter notebooks from
(I use the base environment).
Install ipykernel / r-irkernel
into any environment you want to use inside
notebooks.
For ruby
use rbenv
.
For node.js
use nvm
.
pathlib
to mange i/o.setwd()!
here
package and create paths relative to the top
level of your project.
> here("data","raw","input.dat")
[1] "/home/pieter/projects/dot-project-directory/data/raw/input.dat"
Don't preserve your workspace, since previously stored variables can silently mess up your results.
renv
. (or packrat
)
Don't install packages via your system package manager, only use
install.packages()
.
conda
can also manage R environments, but might not act nicely with
R_LIBS (https://waoverholt.com/conda-and-R/)
renv
: if things go awry, install package into the base environment, purge the renv
files (`renv` dir, `renv.lock` and a line in `.RProfile`) and start over.
The default R installation directory might not be writeable for a user, so a user library should be created (rather than installing packages as an admin).
install.packages('random')
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
Warning in install.packages("random") :
'lib = "/usr/local/lib/R/site-library"' is not writable
Would you like to use a personal library instead? (y/n) y
Would you like to create a personal library
~/R/pc-linux-gnu-library/3.2
to install packages into? (y/n) y
You can also set this in a file:
echo R_LIBS_USER="~/lib/R/library" > .Renviron
Use them appropriately, for exploration, not reproducing.
Write functions in separate files or create a local package (see primer).
knitr
: https://yihui.name/en/2018/09/notebook-war/#summary
As a side note, everyone who uses Jupyter Notebooks should check out its successor JupyterLab and Binder-like cloud solutions. (https://www.dataschool.io/cloud-services-for-jupyter-notebook/)
,@@@@@@@@@@,,@@@@@@@% .#&@@@&&.,@@@@@@@@@@, %@@@@@@%* ,@@@% .#&@@@&&. *&@@@@&( ,@@@@@@@% %@@@@@, ,@@,
,@@, ,@@, ,@@/ ./. ,@@, %@% ,&@# .&@&@@( .@@/ ./. #@&. .,/ ,@@, %@% *&@&. ,@@,
,@@, ,@@&%%%%. .&@@/, ,@@, %@% ,&@# %@& /@@, .&@@/, (@@&%(*. ,@@&%%%%. %@% &@# ,@@,
,@@, ,@@/,,,, ./#&@@@( ,@@, %@@@@@@%* /@@, #@&. ./#&@@@( *(%&@@&. ,@@/,,,, %@% &@# .&&.
,@@, ,@@, ./, .&@# ,@@, %@% ,@@@@@@@@@% ./. .&@# /*. /@@. ,@@, %@% *&@&. ,,
,@@, ,@@@@@@@% .#&@@@@&/ ,@@, %@% .&@# ,@@/.#&@@@@&/ /%&@@@@. ,@@@@@@@% %@@@@@. ,@@,
,*************,,*/(((((//,,*(#%%%%%%%%%%%%%%%#(*,,,****************************************************,*/(((((((((/((((////****/((##%%%%%%
,*************,,//((((((//,,*(%%%%%%%%%%%%%%%%%##/*****************************************************,,*/(///(//////****//((##%%%%%%%%%%%
,************,,*/(((((((//***/#%%%%%%%%%%%%%%%%%%%#(/***************************************************,*//////////*//((#%%%%%%%%%%%%%%%%%
,***********,,*////////////***/##%%%%%%%%%%%%%%%%%%%##(*,***********************************************,,*////////(###%%%%%%%%%%%%%%%%%%%%
,**********,,,*/*******//////**/(#%%%%%%%%%%%%%%%%%%%%%#(/**********************************************,,,***/(##%%%%%%%%%%%%%%%%%%%%%%%%%
,*********,,,,*************///***/(#%%%%%%%%%%%%%%%%%%%%%%#(/***********************************,****,****/((#%%%%%%%%%%%%%%%%%%%%%%%%%%%%#
,*********,,,***************//****/(##%%%%%%%%%%%%%%%%%%%%%%##//**************//////////////////////((#####%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#(
,********,,,,***********************/(#%%%%%%%%%%%%%%%%%%%%%%%##################%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%##(/
,*******,..,***********************,,*/##%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%###((//
,*******,.,,***********************,,,,*(#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%##(//**//
,******,.,,,************************,,,,*/(#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#(//*******
,*****,,,,,********,***,,,,,,,,,,,,*,,,,,,*/(######%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%##(/**********
,*****,..,*******,,,,,,,,,,,,,,,,,,,,,,*,,,,*///((#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%###(/************
,*****,,,*******,,,,,*,,,,,,,,,,,,,,,,,****,,,*/(#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#######(//**************
,****,.,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,**,,,/(%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#((//******************
,***,..,,,,,,,,,,,,,,,,,,,,,,,,,,,,,..,,,,,,,*(#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#(/*******************
,**,,.,,,,,,,,,,,,,,,,,,,,,,,,,,.......,,,,,,/#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#####%%%%%%%%%%%%%%%%#(/******************
,**,..,,,,,,,,,,,,,,,,,,,,,,,,,......,,,*,,,*(#%%%%%%%%##(((/(##%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%##(((/*/((#%%%%%%%%%%%%%%#(/*****************
,*,..,,,,,,,,,,,,,,,,,,,,,,,,,,,.....,,**,,*/#%%%%%%%##((((*,**/#%%%%%%%%%%%%%%%%%%%%%%%%%%%%##((##/,,,*(#%%%%%%%%%%%%%%#(*****************
.*,.,,,**,,,,,,,,,,,,,,,,,,,,,,,,,,*****,,,/(%%%%%%%%#(//(#/,..*/#%%%%%%%%%%%%%%%%%%%%%%%%%%%#(//(#/,..,/(#%%%%%%%%%%%%%%#/*****///////////
.,..,,,,,,,,,,,,,,,,,,,,,,,,,,*,,*******,,,(#%%%%%%%%#(*,,,....,/#%%%%%%%%%%%%%%%%%%%%%%%%%%%#(*,,,....,/(#%%%%%%%%%%%%%%#(*,**////////////
.,..,,,,,,,,,...........,,,,,,*,********,,*(#%%%%%%%%%#(/*,,...,/#%%%%%%%%%%%%%%%%%%%%%%%%%%%%#(/*,,..,*/##%%%%%%%%%%%%%%%#(***////////////
...,,,,,,,................,,*,**********,,/#%%%%%%%%%%%%#((////((#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%##((///(#%%%%%%%%%%%%%%%%%%(/**////////////
..,,,,,,.................,,,**********,,*(#%%%%%%%%%%%%%%%%%%#%%%%%%%%#((///((#%%%%%%%%%%%%%%%%%%%%%#%%%%%%%%%%%%%%%%%%%%%#/**////////////
.,,,,,,,,.................,,***********,,/(####%%%%%%%%%%%%%%%%%%%%%%%%#(/*,,,*(#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#(/*////////////
.,***,,,,,,..............,,,**********,..,***//((##%%%%%%%%%%%%%%%%%%%%%%%##((##%%%%%%%%%%%%%%%%%%%%%%%%%##(((((((((###%%%%%#/**///////////
.*****,,,,,,,,,,,,,,,,,,,*************,..,*******/(#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%##///*//////((#%%%%%#(**///////////
.****************/******/***////*****,.,*///////**/#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#(////////////(#%%%%%#/**//////////
.***********************/////*******,..,*//////////(#%%%%%%%%%%%%%%%%%%%%##########%%%%%%%%%%%%%%%%%%%%#(///////////*/(#%%%%%#(***/////////
.************************///********,..,*//////////#%%%%%%%%%%%%%%%%%%#(//*****///(((##%%%%%%%%%%%%%%%%#(///////////**/##%%%%##/***////////
.***********************************,.,,***///////(#%%%%%%%%%%%%%%%%#(/*,,,*//((((////(#%%%%%%%%%%%%%%%#((////////////(#%%%%%%#(*********//
,***********,,,*,,*,,**************,,,*//******//(#%%%%%%%%%%%%%%%%%#(*,,*/(((#####(((((#%%%%%%%%%%%%%%%##///////////(#%%%%%%%%#(***///////
,*************,,**,,,************,,,,,/(##((((####%%%%%%%%%%%%%%%%%%%(/**/(((#((((#((//(#%%%%%%%%%%%%%%%%%#(((((((((##%%%%%%%%%%#/**///////
,******************************,,,,,,,*(#%#%%%%%%%%%%%%%%%%%%%%%%%%%%#(**/((#(#(((#((//(#%%%%%%%%%%%%%%%%%%%%%%%#%#%%%%%%%%%%%%%#(**///////
,*************,**************,****,,,,,/(#%%%%%%%%%%%%%%%%%%%%%%%%%%%%#(/*/((((#((((///(#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%(/*///////
,*************************************,*/#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%##(////////////(#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#/**/////*
,******////****///////////////////////***/#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%####(((((((###%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#(********
.,*,****///////////////////////////////***/#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%#(/*******
.,,,,*****//////////////////////////*******(#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%##(*******
.,,,,,,***********/////////////////********/(#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%(*******
Try to incorporate unit testing into your workflow.
Even something as simple as an assert
statement checking the number
of columns after reading a .csv file goes a long way.
For more information, check out this talk aimed specifically at data scientists (as opposed to software engineers): https://www.youtube.com/watch?v=0ysyWk-ox-8
When sharing files between Windows and Unix systems, use the dos2unix / unix2dos
command
to convert line breaks ("\r\n" <=> "\n"
).
Go through Wout's presentations and/or the primer.
Add docstrings to your functions.
Create functions.
Be consistent in code style (cf. use a formatter).
Decide on a sane project directory structure. (e.g. https://drivendata.github.io/cookiecutter-data-science/, https://community.rstudio.com/t/data-science-project-template-for-r/3230, https://www.r-bloggers.com/structuring-r-projects/, https://nicercode.github.io/blog/2013-04-05-projects/)
For all one-off commands, download sources and general meta-data, create READMEs in the relevant directories.
For repeated pipelines, use a workflow manager (Make
is fine, but
more feature-rich variants exist). (e.g. snakemake
and
nextflow
, discussion here: https://www.biostars.org/p/258436/)
Python imports are weird.
My prefered approach is to use a locally installed package.
Store all your functions in a src
directory and pip install -e .
it
using a setup.py
in the project root directory.
More reading material on this: