Huon on the internet

Little libraries

By Huon Wilson27 Apr 2015

I’ve been having a lot of fun recently solving “little” problems in Rust. I have a long term project to make something for displaying my (GPS-tagged) photos nicely and, along the way, I’ve discovered and filled in a few gaps by creating focused crates for small tasks.

My travels over the last few years, as displayed by the current web interface (served to the browser via Rust, of course).

Once an idea is formed, cargo means it only takes an hour or two to go from zero to a focused crate with a chunk of code & documentation, published on crates.io and a nice feeling of completion.

The best part is that it makes sense to do this: cargo makes it so easy to reuse code in a reliable way that publishing such little crates is worth the effort: it takes a single line in the package manifest to use them, and package versions only change when you ask0, so your code won’t implicitly break if upstream changes. Rust1 is improving on many traditional systems languages in this respect by encouraging a good package ecosystem from the beginning, whereas languages like C or C++ don’t have a canonical process and so it can take non-trivial effort to integrate third party code.

Of course, not everything can be a small library: for the GPS photo project, I needed to be able to read EXIF data (rexiv2), interact with a database (rusqlite) and display a web interface (iron, nickel). Nonetheless, the ease with which cargo manages dependencies is still massively useful for “big” libraries like those, for both the developers and the users.

The process

I have a “process” with three parts:

It’s pretty simple, especially with some automation for the infrastructure step. Other than writing the code itself, the most complicated part of the whole process is uploading the documentation to be readable online: everything else is essentially a single cargo ... invocation. My usual configuration for Travis looks something like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
language: rust
sudo: false
script:
  - cargo build --verbose && cargo test --verbose && cargo doc --verbose
after_success:
  - |
        [ $TRAVIS_BRANCH = master ] &&
        [ $TRAVIS_PULL_REQUEST = false ] &&
        echo '<meta http-equiv=refresh content=0;url=[CRATE NAME]/index.html>' > target/doc/index.html &&
        git clone https://github.com/davisp/ghp-import &&
        ./ghp-import/ghp-import -n target/doc &&
        git push -fq https://${GH_TOKEN}@github.com/${TRAVIS_REPO_SLUG}.git gh-pages

env:
  global:
    secure: "Khw6CgS[...]jlPRac="

Update 2015-04-28: see Helping Travis catch the rustc train for a neater way to do this and more.

The after_success section manages publishing the documentation that cargo doc renders, via ghp-import. It is inspired by Andrew Hobden’s instructions, modified to avoid needing sudo to be able to get the faster/more responsive Travis builds. Hopefully this will get even easier in future, with some sort of rustdoc as a service. (See also Keegan McAllister’s more secure variant.)

(I believe one can have a .travis.yml that compiles on both nightly and beta/stable by adding something like rust: ["nightly", "1.0.0-beta3"], but I’ve not yet investigated.)

cogset

The first problem that I encountered was wanting to group clusters of photos that are close in space-time, so that I could highlight the segments where I’m actually walking around some interesting place and skip the boring bits of travelling in between.

A few days in Scotland: Linlithgow (small circle, in the middle), Falkirk, back to Linlithgow (large) and ending in Edinburgh.

This is exactly what clustering is designed to do… but unfortunately I couldn’t find any pre-implemented algorithms for doing this in Rust.

A bit of research indicated that I probably wanted a density-based clustering algorithm, such as DBSCAN or OPTICS. These are quite simple to use: feed in a collection of points with the ability to find all the points that are less than ε units away from a given point and DBSCAN efficiently iterates over those points finding “dense” clusters, like the circles in the image above.

A few hours of working through the Wikipedia pseudocode, testing and documenting gave me the cogset crate, with its generic Dbscan type. This slotted straight into my project by just adding cogset = "0.1" to the [dependencies] section.

I particularly like how Dbscan is generic over the index/database of points, letting someone calling Dbscan use a naive index if performance of the DBSCAN computation isn’t a problem, or an efficient spatial index if it is.

I discovered that my Nexus 5 records some timestamps incorrectly: panoramas use the local time as the GPS time, resulting in some confusing artifacts on the map. Fortunately both the local time and the GPS location are recorded correctly, so it is possible to compute the true GPS time by deducing the UTC offset of the local time and hence computing the true GPS time from the local time.

This requires knowing the timezone of a point on the Earth’s surface, again something for which I couldn’t find a good Rust library. Looking into this problem turned up latlong, a very neat little Go library for doing exactly this task: mapping a latitude/longitude pair to the name of the timezone it is in.

Porting it directly2 was straight-forward, resulting in tz-search, which offers:

1
2
let timezone = tz_search::lookup(-33.8885, 151.1908).unwrap();
println!("{}", timezone); // Australia/Sydney

Of course, just knowing the name of the timezone isn’t good enough, but it’s the first step of getting the UTC offset (I haven’t investigated the remaining ones yet…).

order-stat

The most recent “little library” I published is order-stat, for computing order statistics (the kth smallest/largest element of a set). The motivation is expanding cogset slightly, with the OPTICS algorithm, which requires the ability to do exactly that.

Poking about, I found a few selection algorithms, notably quickselect and Floyd–Rivest, as well as the obvious one of doing a full sort and indexing to retrieve the kth element. All offer the same interface, so decision between them is only guided by performance.

Algorithm Huge (ms) Large (µs) Small (ns)
Sort 7.2 87.8 264
Quickselect 1.1 9.7 83
Floyd–Rivest 0.48 2.4 72

Hence, Floyd–Rivest has the honour of being the star of this little library.

Somewhat against the idea of a little library, order-stat includes a medians-of-medians implementation too (in “little”’s defense, it’s related to order statistics and medians).

I should plug the quickcheck crate: it discovered a few subtle bugs in my Floyd–Rivest implementation, stemming from me incorrectly translating the copy-happy original version (presumably designed for GC’d languages where any value can be duplicated cheaply) to a Rust version that avoids clones and respects ownership.

(As a bonus, neither algorithm does any allocations!)

Write your own!

I’m sure there’s little tasks that you’ve solved in your own projects that others might want to solve too, so unleashing them on the world as a focused library on crates.io would be great! With 1.0 fast approaching, the Rust ecosystem has a level of stability never seen before, and maintaining a pile of little libraries is getting easier and easier.

(For best results, include documentation and fill out as much crates.io metadata as you can.)

  1. This locking even holds true when using git dependencies: once you compile once with Cargo versions of dependencies are locked until you explicitly opt-in to a version upgrade with cargo update. See “Cargo.toml vs Cargo.lock” for more details. 

  2. This post isn’t meant to imply that Rust is the only language (or even systems language) with this property. I just think it’s cool that Rust has it. 

  3. It wasn’t quite a one-to-one port: I opted to use an enum and pattern matching to model the different internal encodings, instead of a trait object which would be the true translation of the interface used in the original library.