Little libraries
I’ve been having a lot of fun recently solving “little” problems in Rust. I have a long term project to make something for displaying my (GPS-tagged) photos nicely and, along the way, I’ve discovered and filled in a few gaps by creating focused crates for small tasks.
Once an idea is formed, cargo
means it only takes an hour or two to go
from zero to a focused crate with a chunk of code &
documentation, published on crates.io
and a nice feeling of completion.
The best part is that it makes sense to do this: cargo
makes it so
easy to reuse code in a reliable way that publishing such little
crates is worth the effort: it takes a single line in the package
manifest to use them, and package versions only change when you
ask0, so your code won’t implicitly break if upstream changes.
Rust1 is improving on many traditional systems languages in
this respect by encouraging a good package ecosystem from the
beginning, whereas languages like C or C++ don’t have a canonical
process and so it can take non-trivial effort to integrate third party
code.
Of course, not everything can be a small library: for the GPS photo
project, I needed to be able to read EXIF data (rexiv2
), interact
with a database (rusqlite
) and display a web interface (iron
,
nickel
). Nonetheless, the ease with which cargo
manages
dependencies is still massively useful for “big” libraries like those,
for both the developers and the users.
The process
I have a “process” with three parts:
- write the code
- infrastructure: register travis CI to
upload documentation to be visible at
$user.github.io/$repo
(I also add some project management tools), - crates.io: add metadata and publish the crate
It’s pretty simple, especially with some automation for the
infrastructure step. Other than writing the code itself, the most
complicated part of the whole process is uploading the documentation
to be readable online: everything else is essentially a single cargo
...
invocation. My usual configuration for Travis looks something
like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
language: rust
sudo: false
script:
- cargo build --verbose && cargo test --verbose && cargo doc --verbose
after_success:
- |
[ $TRAVIS_BRANCH = master ] &&
[ $TRAVIS_PULL_REQUEST = false ] &&
echo '<meta http-equiv=refresh content=0;url=[CRATE NAME]/index.html>' > target/doc/index.html &&
git clone https://github.com/davisp/ghp-import &&
./ghp-import/ghp-import -n target/doc &&
git push -fq https://${GH_TOKEN}@github.com/${TRAVIS_REPO_SLUG}.git gh-pages
env:
global:
secure: "Khw6CgS[...]jlPRac="
Update 2015-04-28: see Helping Travis catch the rustc train for a neater way to do this and more.
The after_success
section manages publishing the documentation that
cargo doc
renders, via
ghp-import
. It is inspired
by
Andrew Hobden’s instructions,
modified to avoid needing sudo
to be able to get the faster/more
responsive Travis builds. Hopefully this will get even easier in
future, with some sort of rustdoc
as a service. (See also Keegan
McAllister’s
more secure variant.)
(I believe one can have a .travis.yml
that compiles on both nightly
and beta/stable by adding something like rust:
["nightly", "1.0.0-beta3"]
, but I’ve not yet investigated.)
cogset
The first problem that I encountered was wanting to group clusters of photos that are close in space-time, so that I could highlight the segments where I’m actually walking around some interesting place and skip the boring bits of travelling in between.
This is exactly what clustering is designed to do… but unfortunately I couldn’t find any pre-implemented algorithms for doing this in Rust.
A bit of research indicated that I probably wanted a density-based clustering algorithm, such as DBSCAN or OPTICS. These are quite simple to use: feed in a collection of points with the ability to find all the points that are less than ε units away from a given point and DBSCAN efficiently iterates over those points finding “dense” clusters, like the circles in the image above.
A few hours of working through the Wikipedia pseudocode, testing
and documenting gave me the
cogset
crate, with its
generic
Dbscan
type. This slotted straight into my project by just adding cogset =
"0.1"
to the [dependencies]
section.
I particularly like how Dbscan
is generic over the index/database of
points, letting someone calling Dbscan
use a
naive index
if performance of the DBSCAN computation isn’t a problem, or an
efficient
spatial index
if it is.
tz-search
I discovered that my Nexus 5 records some timestamps incorrectly: panoramas use the local time as the GPS time, resulting in some confusing artifacts on the map. Fortunately both the local time and the GPS location are recorded correctly, so it is possible to compute the true GPS time by deducing the UTC offset of the local time and hence computing the true GPS time from the local time.
This requires knowing the timezone of a point on the Earth’s surface,
again something for which I couldn’t find a good Rust library. Looking
into this problem turned up
latlong
, a very neat little
Go library for doing exactly this task: mapping a latitude/longitude
pair to the name of the timezone it is in.
Porting it directly2 was straight-forward, resulting in
tz-search
, which offers:
1
2
let timezone = tz_search::lookup(-33.8885, 151.1908).unwrap();
println!("{}", timezone); // Australia/Sydney
Of course, just knowing the name of the timezone isn’t good enough, but it’s the first step of getting the UTC offset (I haven’t investigated the remaining ones yet…).
order-stat
The most recent “little library” I published is order-stat
, for
computing order statistics (the k
th smallest/largest element of a
set). The motivation is expanding cogset
slightly, with
the OPTICS algorithm,
which requires the ability to do exactly that.
Poking about, I found a few
selection algorithms,
notably quickselect and
Floyd–Rivest,
as well as the obvious one of doing a full sort and indexing to
retrieve the k
th element. All offer the same interface, so decision
between them is only guided by performance.
Algorithm | Huge (ms) | Large (µs) | Small (ns) |
---|---|---|---|
Sort | 7.2 | 87.8 | 264 |
Quickselect | 1.1 | 9.7 | 83 |
Floyd–Rivest | 0.48 | 2.4 | 72 |
Hence, Floyd–Rivest has the honour of being the star of this little library.
Somewhat against the idea of a little library, order-stat
includes
a
medians-of-medians
implementation too (in “little”’s defense, it’s related to order
statistics and medians).
I should plug the quickcheck
crate: it discovered a few subtle bugs
in my Floyd–Rivest implementation, stemming from me incorrectly
translating the copy-happy original version (presumably designed for
GC’d languages where any value can be duplicated cheaply) to a Rust
version that avoids clones and respects ownership.
(As a bonus, neither algorithm does any allocations!)
Write your own!
I’m sure there’s little tasks that you’ve solved in your own projects that others might want to solve too, so unleashing them on the world as a focused library on crates.io would be great! With 1.0 fast approaching, the Rust ecosystem has a level of stability never seen before, and maintaining a pile of little libraries is getting easier and easier.
(For best results, include documentation and fill out as much crates.io metadata as you can.)
- users
- /r/rust
-
This locking even holds true when using git dependencies: once you compile once with Cargo versions of dependencies are locked until you explicitly opt-in to a version upgrade with
cargo update
. See “Cargo.toml vs Cargo.lock” for more details. ↩ -
This post isn’t meant to imply that Rust is the only language (or even systems language) with this property. I just think it’s cool that Rust has it. ↩
-
It wasn’t quite a one-to-one port: I opted to use an
enum
and pattern matching to model the different internal encodings, instead of a trait object which would be the true translation of theinterface
used in the original library. ↩