A k-mer is a contiguous sequence of k nucleotides (the building blocks of DNA) in a genome. Biologists often use k-mers to identify patterns or motifs in genomic sequences, such as repeated sequences or conserved regions. Let’s build an algorithm to do this.
In this article, we are going to take a look at one of the algorithms we wrote in Genome Toolkit series, Part 2, and attempt to optimize it.
In the previous article (Part 2 here), we wrote our first Genome Toolkit algorithm. Even though, it was a very simple algorithm to help us search for repeating patterns (k-mers) in a DNA/Genome sequences, and it seemed to worked correctly, we actually had a bug in it. Let’s take a look at what it is, and how we can fix it.
Bioinformatics with Python Cookbook: Use modern Python libraries and applications to solve real-world computational biology problems, 3rd Edition.
First function – counting patterns in a sequence.
Welcome to the new series, called “Genome Toolkit”. In this series, we will write a set of tools, that will help us find and build statistical data around any DNA, RNA and Protein sequences.
A guide and advice on how to get started, or how to transition into Bioinformatics for people with biology or programming backgrounds.
Let’s look at how we can program Hamming Distance algorithm in three different ways.
DNA Engine project structure and class setup.
Python dictionary, Rust HashMap and a DNA Reverse Complement function.