Getting started in Bioinformatics: A step-by-step guide.
A guide and advice on how to get started, or how to transition into Bioinformatics for people with biology or programming backgrounds.
Part 1: Why do you want to become a Bioinformatician?
Let’s start with a question: “What made me (you) interested in Bioinformatics?“. If you know the answer to this question, it will be much easier for you to progress, and you will be able to set a clear goal based on it.
It is really important for you to understand why and what you actually want to do in the field of Bioinformatics, or any other field for that matter. Otherwise, it will be really hard to achieve any kind of progress. You will keep asking the age-old question: “What is Bioinformatics, and what can I do with it?“. Any answers you get from people will likely be too complex for a beginner, and most importantly- probably not helpful at all. You should first come up with a clear goal, that focuses on a problem you want to research, that you can tackle and solve.
There are many of us who won’t, and still don’t have an answer to this question, and it makes it harder to get started. Don’t worry, after reading this article you should have a very good idea about the free resources available to you and thus be able to create a plan of action. Maria Nattestad takes an academic approach, and has an amazing video and a YouTube channel here. If you are doing a degree at the moment and are interested in transitioning into Bioinformatics, make sure to watch her videos.
My approach is different, as my background is in Computer Science and has been mostly self-taught. In this article, I will show you what you need to get started, no matter if you are completing a degree or not. We will look at some clearly defined steps, and freely available materials that will help you to get started in Bioinformatics and to set a path of study. It is also how I started my transition from a generic programming career into Bioinformatics.
Part 2: What happens if you try solving Bioinformatics problems without a good understanding of programming fundamentals?
The most universal programming language, Python, is what we will focus on. It’s an easy choice because of the sheer amount of libraries (modules/packages) available, the simplicity of the language, and the large community available which makes for Python as a great general-purpose language. This allows for you to write any type of program.
While I was going through the bioinformatics specialization course, I saw a lot of people writing 120 lines of code with 4-7 functions, to solve a very basic problem. A problem that can, and should be, solved in 3-5 lines if you have a very good understanding of the tool you are using. Python and it’s data structures in our case is this tool.
Trying to use a tool to solve a problem, without understanding how to use that tool, usually results in a loss of motivation. Some (most?) things just can’t be solved in any other way, except one with a deep understanding of the fundamental concepts of the tool you are trying to use. Without that, you will just keep trying to change your code until it works (this will not work in more complex algorithms) or just find someone to fix/solve it for you. In both cases you will still fail to understand it as you don’t have the fundamental understanding of data structures and logic used in the example presented to you.
You will also keep writing very bad code that kinda works, but will be slow and hard to fix or change in the future. Trying to solve any programming problems by copying the code from the internet or from video tutorials without any understanding of why this code works (or doesn’t) is another thing you should avoid doing.
Part 3: Master the fundamentals of programming!
I strongly recommend learning Python from two sources and solidifying that knowledge by solving programming challenges on a website like HackerRank. What do I mean by “fundamentals”? It is a minimal set of (programming) things you have to understand and practice before attempting to tackle and solve any problem.
Luckily enough, the list is not long at all. Here it is (in a specific order):
- Variables/Data Types (integer, float, string, byte, char, boolean, etc.)
- Logic/Branching (if, else, not, or, is, >, <, ==, !=, etc.)
- Loops (for, while)
- Functions (returning a value, passing a value)
- All base data structures (arrays, lists, maps, dictionaries, etc.)
Understanding just some of that will already allow you to write a small program to process genome data. After you feel confident with everything listed above, you can start working on more complex algorithms to search for patterns in genome data.
You have to start learning how to learn. It is like learning how to drive a car; you learn all the fundamental rules (signs, road types, traffic light logic, operating a car) and you can apply these rules to drive anywhere. Even in places and countries you have never driven before.
So this is why mastering fundamentals of biology and/or programming is a step you have to take. There is no way around it.
Work on a small project, where you compile all the things you have learned.
Part 4: What do I need to get started?
Before getting into the plan of action, it is important to mention these prerequisites:
- A computer.
- Good code editor that helps you (I recommend VSCode/VSCodium for simplicity. I have a VSCode for bioinformatics setup article here).
- A course and a book.
- A supportive community.
When you need help, a supportive community can go a long way to furthering your understanding. To make the best of any community you join, first make sure you are willing to invest your time into learning the basics and fundamentals. If you ask about very basic questions in a community chat/forum (instead of spending the time it takes to learn these things yourself) or copy code from the internet and ask the community to fix it for you, it may come across as lazy and people will not even try helping you in most of the cases.
No one wants to be an enabler of laziness. So make sure to lay the groundwork for your learning experience yourself, and when you get stuck somewhere along the way, reach out and connect with others because then you can help the person who gets stuck where you were next time.
Having these things is essential to your success. You provide these necessities, and I provide you with all the necessary resources and my personal recommendations.
Part 5: A plan of action.
So here we are, at the most important step. Below is a list of resources to get you started. It will give you a very strong background in bioinformatics and programming, a clear understanding of where to go next, and it will help you to choose your first project.
You only need to focus on 3 things:
- Bioinformatics course
– Free Introduction course: https://www.coursera.org/learn/bioinformatics
– Rosalind bioinformatics challenges: http://rosalind.info/
- Python Book/Video Series
– A free Python book in PDF and some other formats: https://www.py4e.com/book
– Corey Schafer’s Beginners Python course: https://ibit.ly/n0ql
- Python exercise platform
– HackerRank: https://www.hackerrank.com/domains/python
Let’s talk about all the above points. That free course is probably the best course to get you started with Bioinformatics. It is 4 weeks long, and it covers both biology and programming. That is why there is no biology book or course on the list. Rosalind website is from the same group of people who developed that free course, and they are connected. The free course covers some of Rosalind’s challenges. You will have to take a large 7 module course to complete Rosalind. More on that in Part 6 of this article. A Rosalind profile full of solved bioinformatics challenges, for which you get game-like achievement badges, will be a strong part of your resume.
As I mention before, you should learn Python from two sources at the same time. Reading a book and watching videos on the same topic (part of Python) will have a much better effect than just one. Corey’s videos are legendary and easy to follow. He covers most of the Python programming language. The free book, linked above, also has comprehensive Python coverage.
Finally, after each chapter of the book and a video, make sure you apply what you have learned by solving challenges on the HackerRank website. HackerRank also has a social profile with game-like achievements, that you will add to your resume.
Alright! Now you are ready to take action. No matter at what level your programming or biology knowledge is, completing that first course is a must. Upon completion, you will have a very good idea about the field, and you will have a lot of pointers and suggestions (the course provides them) about where to go next and what bioinformatics problems are available for you to work on. You will be ready to take your next step, a full 7-module course from the same team is a good one. Also, additional statistics and algorithms courses and books are advised, which we will discuss in the following section.
Part 6: Additional resources
Our plan of action, as outlined above, is a strong entry point into the field of bioinformatics. While it will provide you with all the necessary information and a good understanding of the field, you will need to take the next step. If you want to be able to write a good, fast, and clean code, you will need to invest your time into reading books about object-oriented programming, algorithms, data structures, etc.
A full, 7-module Bioinformatics course does a very good job in explaining biology concepts, so you might be able to follow it without an extra biology course or a book. You might also want to take a course in statistics, graph theory, and algorithms. I have listed other free resources you can look at after you have completed our main plan of action. Make sure you join a few related communities, BioStar is an amazing forum. You can also join our Telegram/Matrix chats to discuss and get help with bioinformatics.
- Free Introduction course: https://www.coursera.org/learn/bioinformatics
- Full, 7 course specialization: https://www.coursera.org/specializations/bioinformatics
- Rosalind’s bioinformatics challenges (part of both courses above): http://rosalind.info/
- My Bioinformatics YouTube channel: https://www.youtube.com/c/rebelCoderBio/videos
- Free Biology course: https://www.edx.org/course/introduction-to-biology-the-secret-of-life-3
Extra free courses:
- A free Python book in PDF and some other formats: https://www.py4e.com/book
- Corey Schafer’s Beginners Python course: https://www.youtube.com/watch?v=YYXdXT2l-Gg&list=PL-osiE80TeTskrapNbzXhwoFUiLCjGgY7
- Corey Schafer’s OOP Python videos: https://www.youtube.com/watch?v=ZDa-Z5JzLYM&list=PL-osiE80TeTsqhIuOqKhwlXsIBIdSeYtc
- Amazing bioinformatics forum: https://www.biostars.org/
- Bioinformatics Telegram/Matrix community (you can join either as they are linked): https://t.me/biocodex / Matrix
- HackerRank: https://www.hackerrank.com/domains/python
Important Python books to take your Python skills to the next level:
- Effective Python: 90 Specific Ways to Write Better Python. (Buy: US / UK)
- Python Algorithms: Mastering Basic Algorithms in the Python Language. (Buy: US / UK)
- Bioinformatics Algorithms: Design and Implementation in Python. (Buy: US / UK)
- Gene Analysis. (Buy: US / UK)
Books in this section are not free. I provided Amazon links, but you can probably find them second-hand on eBay. I have done that many times myself, I managed to find books, that cost $50 – $100 for as cheap as $5-$10 with delivery. Most of the people on eBay will ship all over the world.
I hope this article will help you to get started in the amazing field of bioinformatics, to choose a direction and start solving important problems.