Huon on the internet

Rust infrastructure can be your infrastructure

By Huon Wilson17 Mar 2015

Rust is a reasonably large project: the compiler and standard libraries are over 350kloc, built across nearly 40000 commits by the hands of around 900 contributors. Not only that: there are more than 30 other repositories in the rust-lang GitHub organisation that shouldn’t fall by the wayside, and, for rust-lang/rust alone, there are often more than 100 pull requests landing and a dozen new contributors in a single week.

Update 2015-06-16: Homu is now available online: homu.io

It is sometimes chaotic… often chaotic… with 1.0 quickly approaching, and there are definitely places where the core team (and the rest of the community) sometimes can’t keep up, but getting code into rust-lang/rust is rarely one0, due to two critical robots:

The former ensures the master tree is essentially always passing tests, on all first-class platforms: it won’t let a patch land until it does. The latter makes sure that pull requests don’t slip through the cracks, there’s someone with power watching out for each one.

I’ve been wanting to experiment with using the bots for my own code, and yesterday I finally got around to it, and (with a small bit of assistance from the tool authors) got it all setup in a short time. I can say from experience that it isn’t too hard to configure these tools to run on your own repos on GitHub, and they’re not Rust-specific: they work with any repository. All you need is a public-facing server!

Update 2015-03-19: Homu and Highfive have been deployed to the contain-rs organisation, as @FlashCat.

Also, Manish points out that there is actually an open Operations Engineer position for Mozilla Research, to manage the infrastructure of Rust and Servo.

Homu: “Not rocket science”

Homu is a reimplementation/extension of the original bors bot. Bors was implemented by Graydon Hoare (Rust’s original designer) in 2013 (or so) to apply Ben Elliston’s “not rocket science” rule to Rust:

The Not Rocket Science Rule Of Software Engineering:

automatically maintain a repository of code that always passes all the tests

The core idea is simple: the correct way for anyone (including core developers) to land code into rust-lang/rust is to submit a pull request to that repo. Someone on the reviewers whitelist will review the code and, once it looks good, write a “review approved” comment: @bors r+1. Homu will take the pull request, merge it with master into a new branch, and submit that branch to a testing backend. If the tests pass, Homu fast-forwards to the merge commit and starts on the next patch in its queue.

The process of code review and landing gated on testing works wonders for Rust: only really subtly—or transiently—broken patches get into master, other forms of brokenness are eliminated before touching mainline at all and backing out/reverting of patches once landed is rarely needed.2

Having tests always passing sounds great, right? It’s even better because it’s not hard to use Homu with your own repos: just follow the usage instructions (I could only get the git version to work). It supports two testing backends at the moment, Buildbot and Travis CI.

Barosl tells me that he is planning has released Homu-as-a-service, so it is likely to get even simpler in future adding it to your project is easy and we live in the future.

Even if you don’t need the test handling, Homu is still useful: the queue pages are nice pull request summary panels, especially since they can be combined to display pull requests across multiple repos, e.g. the rust-lang Homu instance manages two repositories, cargo and rust, and there’s a variety of ways to digest that:

I easily lose track of pull requests against my own repos, so I’ve registered a lot of my repos with my Homu instance, and the /all endpoint shows me everything I want to know.

Which brings me on to highfive:

Highfive: “welcome! You should hear from @huonw soon.”

As I said, I easily lose track of pull requests against my own repos. My GitHub news feed and emails is very busy with all development and discussion in rust-lang/rust, so small pull requests in other repositories can get drowned out and lost. Fortunately, Rust (and Servo) itself had this problem and has made progress on it already: Nick Cameron built on Josh Matthew’s script for greeting new contributors to create @rust-highfive, a bot that still says hi, but also manages assigning potential reviewers to PRs.

The bot randomly chooses a person (out of a small per-repo whitelist) and uses GitHub’s assignment feature to make the pull request their responsibility. The theory is that, for people who don’t know who should review the code, the randomly chosen reviewer will either do the review themselves or will know someone more appropriate, so things rarely get left in limbo for weeks or months, especially not small patches like edits to documentation.

Of course, in single-person projects like my own, finding other reviewers doesn’t make so much sense, the goal is to be friendly, and have pull requests automatically assigned to me, so that they show up in the list that GitHub can show.

Setting it up

Highfive is currently mainly designed for use as an internal rust-lang tool, and so isn’t as well documented as Homu which was written to be more generic from the start. I’ll write down a bit of docs here, but they’re just an overview, so don’t be afraid to look at/edit the source if you do wish to deploy it: it is easy to customize. Of course, as an internal tool, it is designed to cater just for the needs of rust-lang and support/patches not needed for that use case may not be accepted upstream.

Highfive lives as CGI script newpr.py, using a basic configuration file to authenticate with GitHub and JSON files to control the possible reviewers on a per repo basis. The script selects a person out of a set of eligible reviewers it determines by looking at the JSON files, and the directory that has the most code changes in the pull request.

Highfive also supports pinging reviewers on IRC to… encourage them to make progress on the PR: the upstream repo is currently hard-coded to #rust-bots on irc.mozilla.org. (It’s an internal tool!) Any deployment should disable/change this.

It interacts with GitHub via webhooks: to add Highfive to a repository, create a webhook under the repo’s settings pointing to whereever newpr.py is exposed to the internet, with the application/x-www-form-urlencoded content type.

Configuration files

The GitHub file should just be called config, and looks like:

1
2
3
[github]
user = <user name of the account to use for the bot>
token = <api token generated for that account>

The API token must be protected, i.e. be careful to ensure that file it isn’t accessible over the internet, and keep it out of git (just deleting it after committing it isn’t enough: it will still appear in the history). I’m not sure of the exact permissions the token should have, the instance for rust-lang uses notifications, read:org, repo and user but I suspect just repo is enough.

There are two sorts of JSON repo configuration files: <name>.json defines the set of reviewers chosen in the repo specifically called <name>, while _global.json defines groups of reviewers in scope in all other config files. E.g. For rust-lang, _global.json looks like:

1
2
3
4
5
6
7
{
    "groups": {
        "core": ["@brson", "@pcwalton", "@nikomatsakis", "@alexcrichton", "@huonw"],
        "crates": ["@huonw", "@alexcrichton"],
        "doc": ["@steveklabnik"]
    }
}

And rust.json looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
    "groups": {
        "all": ["core"],
        "compiler": ["@pnkfelix", "@nick29581", "@eddyb", "@Aatch"],
        "syntax": ["@pnkfelix", "@nick29581", "@sfackler", "@kmc"],
        "libs": ["@aturon"]
    },
    "dirs": {
        "doc":              ["doc"],
        "liballoc":         ["libs"],
        "libarena":         ["libs"],
        "libbacktrace":     [],
        "libcollections":   ["libs", "@Gankro"],
        ...
        "librustc":         ["compiler"],
        ...

As one might guess, groups defines groups of reviewers, in terms of GitHub handles and other groups. The dirs dictionary lists directories under src in the repo, Highfive will determine which directory contains the most changed files in the patch, and select those groups/users, so that people with particular areas of expertise get to review those patches (heuristically).

The people in the all group are considered eligible for any review, and so are added to the pool no matter what. As such the dirs field is completely optional, if it is missing (or any directories are missing) Highfive will select from only the all group. I’ve used this to configure my Highfive to assign me for everything (makes sense…).

Batch configuration

It’s pretty annoying to use GitHub’s web interface to manually add to each repo your robot collaborator, and then add the webhooks necessary for Homu and Highfive, so I wrote a few Python scripts to help. The code has some examples of using them to set-up these two pieces of infrastructure.

  1. I’m being… optimistic. There’s so many patches submitted that we only keep up via regular “rollups”: landing several pull requests in a single batch (usually created by the hard-working Manish Goregaokar). This is a curse and a blessing of the design of the testing infrastructure: serialized testing means things take a while to land, but testing everything (including rollups) before landing ensures that the master branch still passes tests. 

  2. Why @bors if its now Homu? The Homu back-end listens for mentions of the account it is registered to use, and rust-lang still uses @bors

  3. Unfortunately, there’s no free lunch, and the test guarantees come at a cost of scalability, as landing patches to master is serialised (see above too).