DNA Toolkit: Introduction

Published by rebelCoder on

This amazing video (bellow) from the ‘MITx Bio’ team introduces a viewer to the structure of the DNA molecule. While it is an amazing way to learn about DNA, you can do so much more than just understand how it is structured. You can use some very basic programming skills to play around with the genes (short DNA sequences) and the genome as a whole, whether it is a randomly generated DNA string, a real gene or a protein sequence from one of the existing databases. You can use a programming language, such as Python, to find interesting patterns in the genome and see if it codes for any particular protein.

In this series of articles/videos, titled “DNA Toolkit” we are going to use Python to write a set of tools. These tools will allow us to replicate fundamental biological processes like DNA reverse complement generation, DNA -> RNA Transcription, and RNA Translation into amino acids and proteins. We will search for patterns/genes in real genomes. Our algorithms will be able to find patterns even when DNA mutations accrue, and much more.


Target Audience

  • Beginner, Mid-level programmers, interested in trying/switching to something more interesting than generic programming and learning by solving interesting and important biological problems.
  • Biologists and Biochemists without a computer science degree, but with some programming experience interested in learning how they can process genomic and other types of data using Python.
  • We will not cover programming basics, as there are many amazing free courses online (listed at the end of this article). We will look at some code optimizations when applicable and focus on solving problems in bioinformatics.

Prerequisites

  • Basic Python or any other programming language of your choice. That said, we will use some very specific Python (Pythonic) code.
  • Basic biology concepts like DNA replication, transcription, RNA translation, codon function, protein formation and of course, willingness to learn biology as we progress through this series.
  • Code Editor and Python 3+. (VSCode, Atom, etc).

My interest and experience in Bioinformatics

I have been a professional QA Automation Engineer and Programmer for 8+ years. I have worked on projects including VR, Embedded systems, Graphics, Rendering, and SaaS. About 2+ years ago, after researching consciousness and aging, I came across Bioinformatics and subsequently I developed a strong interest in this field.

I am interested in and researching the following: longevity, cell signaling for tissue and organ regeneration, DNA damage repair, Next-Generation Sequencing (NGS).

In my work I only use ethical, 100% FOSS (Free and Open-Source Software) for two reasons, I love Linux and I want this to be available for everyone and I don’t want a paywall stopping someone from creating amazing things.

I use my blog and YouTube video series to document my journey into this amazing and very important field. Join me and let’s spread the science!

Blog/Video series structure

  • We will start by learning about DNA and how we can represent it in the code. Then write a set of functions and classes to perform fundamental operations on DNA Strings (Gene/Genome sequences).
  • When we have created a basic set of tools to work with DNA Strings, we will start looking at more complicated algorithms that will help us find patterns in Gene/Genome Sequences, even when mutations are introduced.
  • We will write code that will access and pull information from gene and protein databases in all popular biological data formats.
  • We will be solving Rosalind challenges and earning achievement badges as we progress.
  • I will review some related resources, books, and courses as well as some interesting research in Longevity and Regenerative Biology.

Communication

Bioinformatics community for knowledge sharing and Q&A:

Suggested Viewings/Readings

1) Bioinformatics Algorithms: Design and Implementation in Python
2) Python Algorithms: Mastering Basic Algorithms in the Python Language
3) Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology

First video in “DNA Toolkit” series is available here:

See you in the next article/video, where we start working on our DNA Toolkit!


1 Comment

Bioinformatics Tools Programming in Python with Qt. Part 1. - rebelScience · June 7, 2020 at 15:01

[…] series is semi-connected to the “DNA Toolkit” series, also available on this website. Application code and structure we will be developing […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.