Test-driving Rust: array slices and lifetimes

Last month I took an exciting new class on Coursera called Bioinformatics: finding hidden messages in DNA, sequencing antibiotics, stuff like that. In the class they introduce algorithms to work with DNA sequences, which, from the algorithm’s perspective, are just vectors of characters-nucleotides. The graded exercises provide you with some input files, you run your implementation of the algorithm on the inputs and submit the output. But in addition to that they also have an automated system where you can verify your implementation, or solve extra problems, if you want to, and they support a number of languages there, including Rust. I’ve been looking for an opportunity to test-drive Rust for quite some time and this turned out to be a great domain to apply Rust array slices, and that’s what this blog post is about.

In Rust, slices “are a view into a block of memory represented as a pointer and a length“. To make this discussion practical, let’s apply them in a simple algorithm — let’s say we have the DNA string (which we can represent as an array of single-byte UTF-8 characters) and we want to count the number of occurrences of each n-gram (substring of length n) in that string. And let’s say we want to turn this algorithm into a utility function returning a hash map from each distinct n-gram to the count.

One way to do that would be to return something like a map from string to int. But let’s say we want something more efficient that that, because we need to handle many large DNA strings, and we don’t want to waste memory creating a copy of each n-gram string to use as keys into the map.

In Rust, we can actually avoid any copying by returning a map from a slice to the count, like this:

// "&[u8]" means "a slice of byte array"
fn count_ngrams(text : &str, n : usize) -> HashMap<&[u8], usize> {
   let mut ngram_map = HashMap::new();

   for i in 0..text.len()+1-n {
      // key is a slice of "n" bytes starting from index "i"
      let key = &text.as_bytes()[i..i+n];
      *ngram_map.entry(key).or_insert(0) += 1;


Now wait a minute, a C++ developer might say, are you seriously suggesting to make a utility function that returns a collection of raw pointers/references into a string parameter your code doesn’t own?? That’s insane reckless programming, sir, what about the lifetime of the string???!!!11!!1!

But no worries. That’s where Rust’s brilliant reference lifetime tracking comes into play, and it makes this implementation completely safe in Rust with zero run-time cost: no reference counting, “smart” pointers or garbage collection or anything like that needed at all! If your code compiles, Rust guarantees there’s no “use after free”.

Koala does not believe

Yes, indeed, now let’s try to check that this really works as promised… :-)
Read the rest of this entry »

C++ curiosities: one does not simply move a const object

moveWhen I learned about the C++11 move semantics for the first time, it was mighty confusing to me. My mental model of moving something, as in “move a physical object from point A to point B”, was clearly not an adequate mental model for what C++11 calls “moving an object”. And one of the examples of this inadequacy is the case of const objects. You’d think that it’s perfectly normal to “move” something without changing it, right? I mean, you’re just taking a thing and putting it into another place. The thing itself stays the same!.. But no, to “move” an object in C++ you must be able to change it, because the moving works by breaking into the object, copying some of its internals, and rewiring what’s left in there to change the destruction behavior. Since this is sort of counter-intuitive, bad things can happen if you’re not being careful. I’m showing a couple of real-world examples below.

This post continues the move semantics series: std::move that doesn’t move; move semantics is not about moving; move semantics, sizeof, areaof, and pointy types.

Read the rest of this entry »

Rust syntax highlighter for Far Colorer

Published a minimal schema for Rust syntax highlighting for FarColorer.

Without the new schema, FarColorer plugin in Far Manager picks C++ syntax highlighter for Rust source code files, and while it does look somewhat “rusty”, it is difficult to read:

With the new schema:

  • Rust keywords are recognized;
  • comments are recognized (without nested syntax highlighting yet);
  • single apostrophe in “lifetime parameter” (?) doesn’t derail the parser.

C++ curiosities: move semantics, sizeof, areaof, and pointy types

This post continues the topic raised in “C++ curiosities: move semantics is not about moving”. Last time we’ve seen how a move operation can be split into 2 steps: “shallow copying” and “destructor disarmament”. This time I’ll show a way of using “areaof” to understand the benefits of move operations on a given type, and what I call “pointy types” in my mental model of move semantics.

Let’s start with sizeof and areaof. I’m sure you know operator sizeof, but I’m not sure everyone have seen the term areaof before, since it is not part of the standard C++. It’s kind of an intuitively simple concept, the total amount of memory owned through an instance of an object, but it doesn’t seem to have a well-established name. Have you heard it before?.. Anyway, so far I’ve only seen Alexander Stepanov using it consistently in his papers and talks. To paraphrase an example he gives in the “Notes on Programming” (p. 32), here is how one’d compute areaof for a vector<int>:

size_t areaof(const vector<int>& x)
   return sizeof(vector<int>) + x.size()*sizeof(int);

Read the rest of this entry »

Meta: dissecting yet another coder’s blog

In this post I share my blogging experience for those who are considering to start their own programming-related blog or who are just curious about what it takes to maintain a blog. Note that I’m focusing on software development blogs here, for other kinds of blogs my advice could be completely irrelevant.



My blog is WordPress-based, but before discussing the specifics of this particular platform, let’s look at some alternatives, which might be viable, depending on your goals and software development experience:

  1. You might want to create your blog and your website from scratch, using only the basic building blocks provided with a particular language. The disadvantage of this approach is that you might have to reimplement some basic blog features that you’d otherwise get for free with a WordPress plugin, but on the other hand, you might sometimes find an open-source library that will also give you that feature with little or no effort. The advantage is that you’d have full control over the website, and you might learn a lot just from the exercise, especially if you decide to use a new language or library for that project. My friend Ivan had recently rewritten his website engine from Java to Haskell and published an interesting blog post about his experience.
  2. You might want to look into website and blogging engines “for hackers”, such as Octopress, Jekyll, Hakyll, etc. I have to say that I like these projects in principle, because I like the idea of keeping blog posts in a repository under source control. However, last time I tried Octopress it couldn’t match all the functionality I’m used to in WordPress, so I didn’t switch. But it is definitely something to keep an eye on.
  3. You might want to use LiveJournal, or Blogger, or a similar service, but then you’d get a sub-domain URL, and the least amount of control over your data and blog configuration. Also you won’t be able to host anything else on your domain. This wouldn’t work for me, because I might eventually add other services under my domain, and in fact I’m already using it to host the AnnotateMe web-service, which produces dynamic annotated images, like the one below.

Read the rest of this entry »