Want to get into programming or bioinformatics but not sure where to start?
Throughout the years I have had a lot of people asking me exactly the same question:
“I have a basic knowledge of [programming language]. I am looking for some sample/starter projects to work on in [insert a field]. Please suggest”.
The “insert a field” part can be anything. In my case, it was C++, Python, OpenGL, game engines, embedded systems and now, bioinformatics.
Below are my thoughts and reasons on why it is very hard to recommend a project, especially in Bioinformatics.
You need to know and understand why you want to do “this”, otherwise, it will be very hard, or even impossible to progress and achieve anything. If a person got into Bioinformatics or any other subject, they probably got into it for a specific reason. Let’s say you watched a video or read an article about Michael Levin’s research where he talks about Cell signaling and Cell information transfer for tissue/organ regeneration. It got you very interested and excited to look into that more. Being able to look at organisms that have the ability (genes turned on) to regenerate, and finding genes that allow for that, sounds amazing, right?
You are thinking that being able to help people to grow back limbs, eyes or any other parts of the body is amazing and it is possible as per Michel’s latest research. That can be a good reason to get into Biology, Genomics and Data Science/Programming. If you think that biology/programming is just cool and maybe a good career salary-wise, it might be extremely challenging to achieve anything as you will be constantly stuck in “I have a basic knowledge of X, please recommend a project in Y” scenario. I see people fail again and again, when they are trying to pick up a new skill without a good reason. Having a supportive community and mentor-friends is super important too.
Perhaps you are a biologist, and you want to learn some Data Science and programming to be able to use and write software to help you make sense of your biology/lab data. Of course you could instead pass this data along to a computer scientist and wait for them to provide you with the results. What if the results are not correct due to the computer scientist not fully understanding the basics/fundamentals of biology? Well, you have to write documentation to provide alongside your laboratory data.
If you have a good understanding of Biology and want to add programming to your set of skills, it is important to know that there are amazing, free, and open-source tools to help biologists already. Do you really need to learn how to program in your situation? Of course programming can be a lot of fun, but knowing the basics of some programming language is not the same thing as having a good understanding of computer science, data structures, memory management and just how to write good, efficient code and well-structured code that is easy to maintain. Huge classes, inheriting from each other, might get out of hand very quickly if you don’t have code structuring skills.
Basics in, say, Python is not the same as having a good understanding of Python and how to write good code. The difference between an okay code and a good optimized code, that is written with a good understanding of what you are doing, especially in biology, is that your code either runs 10 seconds or 10 days. A good example would be: searching for motifs in a DNA/RNA string, with point mutations. You just have to know what algorithms exist and how and when to use them. If you write a few naive ‘for loops’ in Python, that solution will not scale well to large datasets.
So again, the basics of Python is not the same as knowing Python, data structures, and algorithms. It all takes time to master. You just have to have good fundamentals of the field you are trying to solve/help to solve problems in. Only then, armed with with this knowledge, you can focus on and dive deep into a very specific subject, otherwise you will not be able to tackle any serious problems. A good example is; you have to know basic math, like Vectors, Matrices, and angle calculations to be able to work on computer graphics. Being a good programmer alone, without a good understanding of basic math, won’t let you solve computer graphics problems. You need math.
“I am working on a basic DNA nucleotide count code, DNA into RNA transcription, but I want to work on a full scale project, maybe multi-omics data analysis, or something?”.
You need to be able to program and understand many fundamental concepts before you can even tackle problems like that.
As the legendary Michael Abrash said:
“…there are a lot of problems you can’t tackle any other way”.
This means that knowing the fundamentals of the field you are working in is imperative to avoid “reinventing a (slow/bad) wheel”.
He has a very good response to the classic, “where to start” question here (whole interview is amazing, but the answer at 3:52 mark is more relevant to this article):
An amazing programmer, Ken Silverman, was the creator of Build Engine that powered iconic games like Duke Nukem 3D, Shadow Warrior, and Blood. He once replied to my “where to start” question with this:
“These three things are the basis for a good programmer”
1) Full understanding of a programming language syntax and libraries.
2) Good understanding of all of the basic algorithms and data structure and different permutations of them.
3) Knowledge and understanding of which algorithm and which data structure to use in certain situations (this can only be obtained by writing a lot of code, reading books and looking at others’ code).
It seems that the easiest case is when a programmer is interested in switching to bioinformatics. Point 1 still applies here. You have to have a reason why, otherwise you will keep asking “what can I do, how can I help” and this will look like you have no clue why you are doing this and people will probably not collaborate with you. This applies to anything you will want to collaborate with other people on.
If you as a programmer are getting into biology, you probably already know why and what code needs to be written to solve certain problems. In this case, you know what to work on and what to study. It is also easier for programmers as you don’t really need a super-deep understanding of biology as long as you have a good description of a problem that needs to be solved, supplemented with sample inputs and outputs to test your code.
What do I suggest?
If you are proficient in one of the fields and you want to pick up a new skill, you probably have already learned how to learn. Picking up a new skill should not be a problem in your case.
If you are a total beginner, I always suggest picking up a new skill from at least two sources. It can be any combination of the following; books, online courses, YouTube videos, articles. It is okay even if they cover only mostly the same material. Learning from one source and reinforcing your understanding with another will make sure you have grasped the material. Alongside study materials, you should be solving problems on websites like Hacker Rank, Project Euler or Rosalind. This will also be a part of your new portfolio.
Progressing through these sources will allow you to establish a good fundamental knowledge of a subject. A few days in you will be hooked and reach so-called “learning escape velocity” or give up and move on.
For bioinformatics specifically, I have the following for you to try, no matter if you are a biologist or a programmer:
- Sign-up for this free, 4 Week/Module Coursera course: Link and start learning. It covers both biology for programmers and programming for biologists.
- Get any “Bioinformatics in Python” book.
- Start working away on both, while trying to solve HackerRank/Rosalind problems.
You will be surprised by how many people give up after the first week of this course. I completed this particular course a while ago and I saw the number of people in the comments section drop dramatically as the course progressed from week 1 to week 4, and very few people completed it, compared to the number of people who joined the course. My estimation would be 5 out of 200 people manage to finish this course. And remember, it is just an introductory course. Trying this course alone will give you a good idea if your “basics of language X” and/or “basics of biology” are enough to work on any project that you are asking others to suggest to you. You will also see how complex it might get even on a very basic level.
Following the three steps I listed above, will also give you a very clear answer to the questions of “I have the basics, what can I do” as they will reference and point you into the direction of other resources and projects you can pick up upon completing a course/book.
The above statements are just my opinions, based on my experience. This is intended to be a dynamic article, as opinions and experiences of other people, as well as mine, can and probably will effect it. Edits of this article will be clearly marked.
To sum it up
You have to know why you want to learn programming as a biologist and understand that having the basics of some programming language are definitely not enough. As a programmer, you have to have a reason why you want to do computational biology. In both cases, ask yourself: why do I want to do this, what made me want to switch to a new field, or add a new skill to my skill set. If you can answer this question, you have an answer for “what project can I work on”.
If you have any suggestions and/or comments, feel free to join our Bioinformatics telegram channel: https://t.me/biocodex