Most scientists these days are coming to realize how imperative it is for them to be able to use computers. By this I don’t mean just being able to turn one on, use the internet and basic programs like Microsoft Word and Excel. No, I’m talking about using the command line and writing simple programs in languages like C++, Python, and Perl. These skills are not only desirable because they make you a more marketable job candidate–for many of us, they’re absolutely necessary to just access, manipulate, and analyze our data.
This is because new tecnologies, such as next generation sequencing options like the Illumina HiSeq macine, are generating huge amounts of data. Sequencing DNA in one lane on that machine will generate millions of sequencing reads, each of which contains ~100 base pairs of DNA. Although these are relatively short reads (genes, especially in eukaryotes, are hundreds or thousands off base pairs in length), the sheer number of them can become overwhelming when you start thinking about data analysis. The types of files that contain these data would be simply unreasonable to deal with in programs like Excel or Word.
Thus, biologists like myself are being thrust into the world of computer science. It can be frightening to get started, and can take a long time to really get comfortable using computer programming approaches. I have been writing a model simulating a population under different types of selection in C++ for over a year, and it still seems like every time I sit down to write a new block of code I end up learning something new!
But a couple weeks ago I sent my first batch of samples off for that in depth next generation sequencing that I mentioned before, and I realized that my piecemeal knowledge of C++ wasn’t going to cut it when I got my data back–I needed to become familiar with the command line and linux. So I picked up a wonderful book by Drs Haddock and Dunn called Practical Computing For Biologists. I’m about halfway through the book, and I already feel infinitely more confident with the command line, the unix shell, and python.
The book is written in an incredibly straightforward manner. Many computer programming books are full of jargon and can be very confusing; Practical Computing for Biologists is not one. It doesn’t go into unnecessary detail about how to do esoteric programming tasks. Instead, the book uses biologically relevant tasks, like searching a file full of data from various sampling sites for every data point from a particular place or date. The book has an accompanying website where a bunch of example files are available to be downloaded. The first few chapters instruct you on how to set up your computer and how to do basic tasks using the command line and text editors. Then it goes into using Python and writing programs, and then goes into more complicated programming tasks. I highly recommend this book for anyone who wants or needs to learn basic computer programming skills. It is a wonderful starting point, and soon your desk may begin to look like mine!