A k-mer is a contiguous sequence of k nucleotides (the building blocks of DNA) in a genome. Biologists often use k-mers to identify patterns or motifs in genomic sequences, such as repeated sequences or conserved regions. Let’s build an algorithm to do this.
Year: 2022
In this article, we are going to take a look at one of the algorithms we wrote in Genome Toolkit series, Part 2, and attempt to optimize it.
In the previous article (Part 2 here), we wrote our first Genome Toolkit algorithm. Even though, it was a very simple algorithm to help us search for repeating patterns (k-mers) in a DNA/Genome sequences, and it seemed to worked correctly, we actually had a bug in it. Let’s take a look at what it is, and how we can fix it.
Bioinformatics with Python Cookbook: Use modern Python libraries and applications to solve real-world computational biology problems, 3rd Edition.
First function – counting patterns in a sequence.
Welcome to the new series, called “Genome Toolkit”. In this series, we will write a set of tools, that will help us find and build statistical data around any DNA, RNA and Protein sequences.