From Python to Rust: Part 1.

Published by rebelCoder on

In this series of articles, we will explore a young and very exciting programming language — Rust. We will see how Rust can help us to speed up some of our existing and future Python code. Many agree, that the key features of Python are its simplicity and readability. Let’s try and see if we can have the same or similar simplicity and readability from Rust and get that incredible execution speed of a compiled language.

While Python is an amazing, general purpose language and is super easy to write and understand, it might be too slow when we need to write an algorithm that needs to ‘crunch’ through a lot of data. Of course, well written Python code that uses something like NumPy or Cython, is still, very very fast and probably enough in most cases. All of the checks that are done by interpreted language like Python, will slow our code a lot.

We will look at Python and Rust code side-by-side and attempt to port (rewrite) some of our Bioinformatics related code from Python to Rust while comparing code execution speed of both. We will see, that we don’t necessarily need/have to sacrifice simplicity of Python for Rust’s speed. And some functionality, written in Python, will stay in Python and just call our faster code, rewritten in Rust. This way we will attempt to get a ‘sweet spot’ (best from both languages) to solve our Bioinformatics problems.


Prerequisites:

  • Good Python knowledge.
  • Basic Rust knowledge (I assume you are here because you know what you are doing) 😛
  • You understand the difference between compiled and interpreted language.
  • You understand the difference between statically and dynamically typed languages.
  • Code Editor with both, Python and Rust integrations.

Good to have:

  • Previous C-style (C-type?) language experience.
  • Desire to learn a new, exciting, compiled, memory safe and fast systems programming language.

Make sure you have the latest Python version installed: Link
Make sure you he the latest Rust/Cargo version installed: Link

We start by creating a folder, I called py_rust. In that, we create one folder ‘by hand’: python and add an empty file called main.py. This is where our Python code will live. For the Rust folder, we will use Rust’s project and package manager, Cargo. This will create a small project for us, with all the necessary files, needed to run and configure the Rust project.

Start a prompt (bash/shell/cmd) in our newly created py_rust folder, which already contains Python and execute the following:

cargo new rust --bin

So parameters are:

new – makes the cargo tool create a new project.
rust – the name of the folder for the project.
--bin – make the cargo tool create a project for a binary, not a library.

Now we should have the following structure:

[py_rust]
├── python
│   └── main.py
└── rust
    ├── Cargo.lock
    ├── Cargo.toml
    ├── src
    │   └── main.rs
    └── target		

As Rust is a comparably young language, you might find tools and IDE integrations don’t work as well as say for Python, C++, Java. I am using VSCodium (VSCode) and it does have a good enough integration. Our goal here is to see two languages, Python and Rust, side-by-side, and be able to just use a keyboard shortcut to run both.

Please make sure that you add Python and Rust project folders into our code editor one-by-one, and not the whole py_rust folder. This is because the Rust cargo system will be looking for a project source in a root directory, so if our root directory is py_rust, this will fail. Adding each folder separately will make sure that both, Python and Rust are executed not from py_rust but from their individual workspace. More details on that in my video version of this article.

In the article here, we set up a VSCodium editor to be able to run Python as easy as possible, using just two plugins. If you have this configured already, you should, in theory, be able to run Rust exactly the same way. Make sure you have these two extensions installed:

Rust Bioinformatics

The only thing we need to add is a code runner configuration for Rust.

1. While you have VSCodium opened, hit F1 and type settings and choose the first option:

2. Add this line to your settings.json file:

"code-runner.executorMap": {
	"rust": "if [ $(basename $dir) = 'examples' ]; then cargo run -q --example $fileNameWithoutExt; else cargo run -q; fi"
}

Now we should be able to run both, Python and Rust with just a key combination (Ctrl + Alt + N).


Let’s add our first line of code and execute it:

print("Hello, I am Python!")
fn main() {
    println!("Hello, I am Rust and I am fast");
}

And now we can try executing the code and if everything is configured correctly, we should see this:

Rust output.
Python output.

And here is how our setup should look like:

Rust Bioinformatics
Project view. We can click-in to opened Python or Rust files and use (Ctrl + Alt + N) to execute each.

Now let’s start looking at the actual code.

x = 100.0
y = 1.0

print(f"x: {x}, and y: {y}")
y = y * -0.3145
print(f"x: {x}, and y: {y}")
let x = 100.0;
let mut y = 1.0;

println!("x: {}, and y: {}", x, y);
y = y * -0.3145;
println!("x: {}, and y: {}", x, y);

Note: for Rust, we need to remember that all of the code is wrapped in the main() {} function.

We can see, that the code is very similar. Print statements differ just a little bit but easy to get used to. In Python we use f-strings. But difference, of course, is in how we declare variables. This is why Rust is statically typed. We need to tell it what type our variable is before we use it. The concept of mutable/immutable is present in both languages, but in Rust we need to specify mut if we plan to change the value of the variable somewhere in the code. So on the line 5 we modify variable y and that is why we need it to be mutable. If we don’t declare it as let mut and try executing the code, we will see the beauty of Rust’s compiler:

error[E0384]: cannot assign twice to immutable variable `y`
 --> src/main.rs:8:5
  |
5 |     let y = 1.0;
  |         -
  |         |
  |         first assignment to `y`
  |         help: make this binding mutable: `mut y`
...
8 |     y = y * -0.3145;
  |     ^^^^^^^^^^^^^^^ cannot assign twice to immutable variable

error: aborting due to previous error

For more information about this error, try `rustc --explain E0384`.
error: could not compile `rust`.

To learn more, run the command again with --verbose.

We can see, that Rust compiler tells us exactly what is wrong. I wish all languages had such a helpful output.


Let’s see how if and else look like in both:

if y < x:
    print(f"The difference is: {x-y}")
elif y == x:
    print(f"The difference is: {x-y}")
else:
    print(f"The difference is: {y-x}")
if y < x {
    println!("The difference is: {}", (x - y));
} else if y == x {
    println!("The difference is: {}", (y - x));
} else {
    println!("The difference is: {}", (y - x));
}

We can see that code is almost identical. C-style language programmers probably will be tempted to wrap y < x in (y < x). Let’s try doing that and see what Rust will tell us.

warning: unnecessary parentheses around `if` condition
  --> src/main.rs:10:8
   |
10 |     if (y < x) {
   |        ^^^^^^^ help: remove these parentheses
   |
   = note: `#[warn(unused_parens)]` on by default

Well, what can I say? Beautiful! Learning Rust with such amazing help from the compiler is pure joy.


Loops will look very similar:

x = 5
while x > 0:
    print(f"x is {x}")
    x -= 1

for i in range(3, 7):
    print(f"i is {i}")
let mut x = 5;
while x > 0 {
    println!("x is {}", x);
    x -= 1;
}

// Version 1 (Very Pythony):
for i in 3..7 {
    println!("i is {}", i);
}

// Version 2 (If you need to create a range on the fly):
let num_range = 3..7;
for i in num_range {
    println!("i is {}", i);
}

While/For loops and range usage are almost identical. The only difference is if you want to re-use x variable in Python, you just overwrite it. This is where we can see that Python is not, and Rust is type-safe and does not allow you to just re-use a variable and requires something called variable shadowing on line 1. This is needed to protect you from writing broken code. Originally, we declared x = 100.0, and Rust interpreted it as a float. Now, if we just try overwriting it as x = 5; Rust will tell us that the original variable was float and now you are trying to assign an integer to it. Try doing that yourself and see what Rust compiler will tell you.


Let’s take a look at a List/Vector and how we can get an iterator and enumerator out of both:

tmp_lst = ['DNA', 'RNA', 'mRNA']

for value in tmp_lst:
    print(f"value is {value}")
let tmp_vec = vec!["DNA", "RNA", "mRNA"];

for value in tmp_vec {
    println!("value is {}", value);
}

This is easy. What in Python is a List, is a Vector in Rust. We should see this output from both:

value is DNA
value is RNA
value is mRNA

Now, if Python will allow accessing/modifying our List after we access it in a For loop, Rust will not allow us to do the same to its Vector. After both loops, try acceding/printing just the first element of a List/Vector: like this:

# Python:
print(tmp_lst[0])

// Rust
println!("{}", tmp_vec[0]);

And now Rust will introduce us to yet another concept of Ownership and Borrowing, also (here)

error[E0382]: borrow of moved value: `tmp_vec`
  --> src/main.rs:34:20
   |
26 |     let tmp_vec = vec!["DNA", "RNA", "mRNA"];
   |         ------- move occurs because `tmp_vec` has type `std::vec::Vec<&str>`,
   | 	                                 which does not implement the `Copy` trait
27 |     for value in tmp_vec {
   |                  -------
   |                  |
   |                  value moved here
   |                  help: consider borrowing to avoid moving into the for loop: `&tmp_vec`
...
34 |     println!("{}", tmp_vec[0]);
   |                    ^^^^^^^ value borrowed here after move

If you never heard about this Rust concept, you should invest your time in reading about it. Basically, after we provided our tmp_vec Vector to a For loop in Rust, we gave ownership of our tmp_vec Vector to that For Loop. For loop did what we asked it to do, which is to print contents of that Vector, and after that, tmp_vec was discarded as our For loop owned it. After For loop was done and out of scope, everything in the scope of that For loop is now gone too.

This is easily ‘fixed’ by ‘asking’ our tmp_vec to return just the iterator, and not itself: tmp_vec.iter(), and now line 7 will not produce an error:

let tmp_vec = vec!["DNA", "RNA", "mRNA"];

for value in tmp_vec.iter() {
    println!("value is {}", value);
}

println!("{}", tmp_vec[0]);

And now let’s add an enumerator to our For loop:

tmp_lst = ['DNA', 'RNA', 'mRNA']

for pos, value in enumerate(tmp_lst):
    print(f"value at pos {pos} is {value}")
let tmp_vec = vec!["DNA", "RNA", "mRNA"];

for (pos, value) in tmp_vec.iter().enumerate() {
    println!("value at pos {} is {}", pos, value);
}

Almost identical. Alright. This is it for this article. See you in the next one soon.

All of the code is available her: https://gitlab.com/RebelCoder/py_rust

Video version of this article.

1 Comment

Bioinformatics Tools Programming in Python with Qt. Part 1 - rebelScience · June 7, 2020 at 15:11

[…] will start creating a structure for our project, and organize files and class. I also have a ‘From Python to Rust‘ series, where we port some of our bioinformatics code to Rust, and these two series will […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.