Rust infrastructure can be your infrastructure
Rust is a reasonably large project: the compiler and standard libraries are over 350kloc, built across nearly 40000 commits by the hands of around 900 contributors. Not only that: there are more than 30 other repositories in the rust-lang GitHub organisation that shouldn’t fall by the wayside, and, for rust-lang/rust alone, there are often more than 100 pull requests landing and a dozen new contributors in a single week.
Update 2015-06-16: Homu is now available online: homu.io
It is sometimes chaotic… often chaotic… with 1.0 quickly approaching, and there are definitely places where the core team (and the rest of the community) sometimes can’t keep up, but getting code into rust-lang/rust is rarely one0, due to two critical robots:
- an integration bot/pull request manager, Barosl Lee’s Homu
- a review assigner, Josh Matthews and Nick Cameron’s Highfive
The former ensures the master tree is essentially always passing tests, on all first-class platforms: it won’t let a patch land until it does. The latter makes sure that pull requests don’t slip through the cracks, there’s someone with power watching out for each one.
I’ve been wanting to experiment with using the bots for my own code, and yesterday I finally got around to it, and (with a small bit of assistance from the tool authors) got it all setup in a short time. I can say from experience that it isn’t too hard to configure these tools to run on your own repos on GitHub, and they’re not Rust-specific: they work with any repository. All you need is a public-facing server!
Update 2015-03-19: Homu and Highfive have been deployed to the contain-rs organisation, as @FlashCat.
Also, Manish points out that there is actually an open Operations Engineer position for Mozilla Research, to manage the infrastructure of Rust and Servo.
Homu: “Not rocket science”
Homu is a reimplementation/extension of the original bors bot. Bors was implemented by Graydon Hoare (Rust’s original designer) in 2013 (or so) to apply Ben Elliston’s “not rocket science” rule to Rust:
The Not Rocket Science Rule Of Software Engineering:
automatically maintain a repository of code that always passes all the tests
The core idea is simple: the
correct way for anyone (including core developers) to land code into
rust-lang/rust is to submit a pull request to that repo. Someone on
the reviewers whitelist will review the code and, once it looks good,
write a “review approved” comment:
@bors r+
1. Homu
will take the pull request, merge it with master into a new branch,
and submit that branch to a testing backend. If the tests pass, Homu
fast-forwards to the merge commit and starts on the next patch in
its queue.
The process of code review and landing gated on testing works wonders for Rust: only really subtly—or transiently—broken patches get into master, other forms of brokenness are eliminated before touching mainline at all and backing out/reverting of patches once landed is rarely needed.2
Having tests always passing sounds great, right? It’s even better because it’s not hard to use Homu with your own repos: just follow the usage instructions (I could only get the git version to work). It supports two testing backends at the moment, Buildbot and Travis CI.
Barosl tells me that he is planning has released
Homu-as-a-service, so it is likely to get even
simpler in future adding it to your project is easy and we live in
the future.
Even if you don’t need the test handling, Homu is still useful: the
queue pages are nice pull request summary panels, especially since
they can be combined to display pull requests across multiple repos,
e.g. the
rust-lang Homu instance manages
two repositories, cargo
and rust
, and there’s a variety of ways to
digest that:
I easily lose track of pull requests against my own repos, so I’ve
registered a lot of my repos with my Homu instance, and the /all
endpoint shows me everything I want to know.
Which brings me on to highfive:
Highfive: “welcome! You should hear from @huonw soon.”
As I said, I easily lose track of pull requests against my own repos. My GitHub news feed and emails is very busy with all development and discussion in rust-lang/rust, so small pull requests in other repositories can get drowned out and lost. Fortunately, Rust (and Servo) itself had this problem and has made progress on it already: Nick Cameron built on Josh Matthew’s script for greeting new contributors to create @rust-highfive, a bot that still says hi, but also manages assigning potential reviewers to PRs.
The bot randomly chooses a person (out of a small per-repo whitelist) and uses GitHub’s assignment feature to make the pull request their responsibility. The theory is that, for people who don’t know who should review the code, the randomly chosen reviewer will either do the review themselves or will know someone more appropriate, so things rarely get left in limbo for weeks or months, especially not small patches like edits to documentation.
Of course, in single-person projects like my own, finding other reviewers doesn’t make so much sense, the goal is to be friendly, and have pull requests automatically assigned to me, so that they show up in the list that GitHub can show.
Setting it up
Highfive is currently mainly designed for use as an internal rust-lang tool, and so isn’t as well documented as Homu which was written to be more generic from the start. I’ll write down a bit of docs here, but they’re just an overview, so don’t be afraid to look at/edit the source if you do wish to deploy it: it is easy to customize. Of course, as an internal tool, it is designed to cater just for the needs of rust-lang and support/patches not needed for that use case may not be accepted upstream.
Highfive lives as CGI script newpr.py
, using a basic configuration
file to authenticate with GitHub and JSON files to control the
possible reviewers on a per repo basis. The script selects a person
out of a set of eligible reviewers it determines by looking at the
JSON files, and the directory that has the most code changes in the
pull request.
Highfive also supports pinging reviewers on IRC to… encourage them
to make progress on the PR: the upstream repo is currently
hard-coded
to #rust-bots
on irc.mozilla.org
. (It’s an internal tool!) Any deployment should
disable/change this.
It interacts with GitHub via
webhooks: to add
Highfive to a repository, create a webhook under the repo’s
settings pointing to whereever newpr.py
is exposed to the internet,
with the application/x-www-form-urlencoded
content type.
Configuration files
The GitHub file should just be called config
, and looks like:
1
2
3
[github]
user = <user name of the account to use for the bot>
token = <api token generated for that account>
The API token must be protected, i.e. be careful to ensure that file
it isn’t accessible over the internet, and keep it out of git
(just deleting it after committing it isn’t enough:
it will still appear in the history). I’m not sure of the exact
permissions the token should have, the instance for rust-lang uses
notifications
, read:org
, repo
and user
but I suspect just
repo
is enough.
There are two sorts of JSON repo configuration files: <name>.json
defines the set of reviewers chosen in the repo specifically called
<name>
, while _global.json
defines groups of reviewers in scope in
all other config files. E.g. For rust-lang,
_global.json
looks like:
1
2
3
4
5
6
7
{
"groups": {
"core": ["@brson", "@pcwalton", "@nikomatsakis", "@alexcrichton", "@huonw"],
"crates": ["@huonw", "@alexcrichton"],
"doc": ["@steveklabnik"]
}
}
And
rust.json
looks like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
"groups": {
"all": ["core"],
"compiler": ["@pnkfelix", "@nick29581", "@eddyb", "@Aatch"],
"syntax": ["@pnkfelix", "@nick29581", "@sfackler", "@kmc"],
"libs": ["@aturon"]
},
"dirs": {
"doc": ["doc"],
"liballoc": ["libs"],
"libarena": ["libs"],
"libbacktrace": [],
"libcollections": ["libs", "@Gankro"],
...
"librustc": ["compiler"],
...
As one might guess, groups
defines groups of reviewers, in terms of
GitHub handles and other groups. The dirs
dictionary lists
directories under src
in the repo, Highfive will determine which
directory contains the most changed files in the patch, and select
those groups/users, so that people with particular areas of expertise
get to review those patches (heuristically).
The people in the all
group are considered eligible for any review,
and so are added to the pool no matter what. As such the dirs
field
is completely optional, if it is missing (or any directories are
missing) Highfive will select from only the all
group. I’ve
used this to
configure my Highfive
to assign me for everything (makes sense…).
Batch configuration
It’s pretty annoying to use GitHub’s web interface to manually add to each repo your robot collaborator, and then add the webhooks necessary for Homu and Highfive, so I wrote a few Python scripts to help. The code has some examples of using them to set-up these two pieces of infrastructure.
- users
- /r/rust
-
I’m being… optimistic. There’s so many patches submitted that we only keep up via regular “rollups”: landing several pull requests in a single batch (usually created by the hard-working Manish Goregaokar). This is a curse and a blessing of the design of the testing infrastructure: serialized testing means things take a while to land, but testing everything (including rollups) before landing ensures that the master branch still passes tests. ↩
-
Why
@bors
if its now Homu? The Homu back-end listens for mentions of the account it is registered to use, and rust-lang still uses @bors. ↩ -
Unfortunately, there’s no free lunch, and the test guarantees come at a cost of scalability, as landing patches to master is serialised (see above too). ↩