crates.io crate graph
Rust is a systems programming language that comes with an awesome package manager Cargo, which hooks into the crates.io registry as one of its possible sources of packages. The packages can have dependency relationships between each other, making the database into a natural directed graph.
Rust as we have it today is still relatively new, Cargo is even newer, and crates.io is newer still, so the package ecosystem is small: at the time of writing, only 681 crates exist on crates.io (compared to 115 thousand for node.js’ npm). I’m sure this will quickly pick up with as Rust moves to 1.0 and beyond, but at the moment the network of crates and their dependencies is still easily small enough to be handled globally with simple means like the graphviz suite of tools and naive Rust programs. Which is exactly what I did.
The graphs
The “full” graph of the ecosystem, as rendered by graphviz’s fdp
, is
busy, very busy:
Click for the rest of the much larger graph as an SVG with clickable package names, sized according to the number of dependent packages. You may wish to zoom out to get your bearings.
That’s not even the complete package graph: development dependencies are completely ignored (they can cause cycles), and any crates with no dependencies and no dependent crates are not shown, since they’re not yet interacting with the ecosystem at all; but even so, the graph is fairly useless.
The suck is mainly due to the most popular crates like
time
and
rustc-serialize
, which
pull most clusters into the very center of the graph. Eliminating them
(specifically, crates with 15 or more dependent crates) gives a more
reasonable graph.
(Click for bigger.)
That graph makes it clearer that there’s a few distinct clusters. The
left has a lot of web-development functionality, clustered around
hyper
,
conduit
and
openssl
. The right has a lot of
game-development and computer graphics libraries, with
many components (that’s not all
of them) from Piston and many from
RustAllegro. Spread around
are smaller clusters, like
epsilonz, and a variety of
numerical projects (of which some
use num
,
and others do not).
It’s not a cluster so much, but there are a lot of examples of some
crate $foo
depending on $foo-sys
: people following
the convention for publishing FFI bindings.
Collecting the data
crates.io uses a git repo for distributing information about the registered crates. Each one gets a file containing a series of JSON objects (one per line) looking a lot like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
"cksum" : "665e3764d2f654d77382ec6ed40a2faf5a114a6e41e2d1c307ff97916924ec64",
"deps" : [
{
"default_features" : true,
"kind" : "normal",
"name" : "num",
"optional" : false,
"target" : null,
"req" : "~0",
"features" : [
""
]
}
],
"vers" : "0.1.5",
"yanked" : false,
"name" : "slow_primes",
"features" : {}
}
That’s the info for version 0.1.5 of
slow_primes
; it contains the
key piece of information0 that we need: the dependencies, in the deps
field. The simplistic analysis I’m doing here means that the only
facts of interest are the name
of the dependency and whether it is a
dev-dependency (kind == "dev"
).
The fixed format of the JSON makes it ameniable to
#[derive(RustcDecodable)]
, an attribute that will automatically
create deserialization code that does the right thing:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
#[derive(RustcDecodable)]
struct CrateInfo {
name: String,
vers: String,
deps: Vec<DepInfo>,
cksum: String,
features: HashMap<String, Vec<String>>,
yanked: bool,
}
#[derive(RustcDecodable)]
struct DepInfo {
name: String,
req: String,
features: Vec<String>,
optional: bool,
default_features: bool,
target: Option<String>,
kind: Option<String>
}
The graph is based on the most recent version of each package, so I
just take the last line in each crate’s file, run it through
json::decode
and get back a CrateInfo
. A few tens of lines later, the code knows
about every crate and about every dependency link and can print it all
out to a graphviz DOT file. (Unfortunately the neat
graphviz
library
doesn’t offer the flexibility I wanted for setting arbitrary
attributes, so I had to resort to manual printing.)
There is one trip-up: a crate can depend on another crate multiple
times, with different configurations (most commonly, differing
target
s), so some deduplication is required to avoid double counting
and cluttering the graph with multiple lines. Other than that, the
details of the implementation aren’t very interesting, but the code is
publicly available at
github.com/huonw/crates.io-graph.
Thanks to cmr, acrichto, FreeFall and tomaka in #cargo for help/suggestions/copy-editing/catching bugs.
- /r/rust
-
Before FreeFall pointed this out, I was considering downloading everything on crates.io to construct the graph, which would’ve been pretty fun too! ↩