crates.io crate graph

By Huon Wilson — Published 04 Jan 2015

Contents

    Rust is a systems programming language that comes with an awesome package manager Cargo, which hooks into the crates.io registry as one of its possible sources of packages. The packages can have dependency relationships between each other, making the database into a natural directed graph.

    Rust as we have it today is still relatively new, Cargo is even newer, and crates.io is newer still, so the package ecosystem is small: at the time of writing, only 681 crates exist on crates.io (compared to 115 thousand for node.js’ npm). I’m sure this will quickly pick up with as Rust moves to 1.0 and beyond, but at the moment the network of crates and their dependencies is still easily small enough to be handled globally with simple means like the graphviz suite of tools and naive Rust programs. Which is exactly what I did.

    The graphs

    The “full” graph of the ecosystem, as rendered by graphviz’s fdp, is busy, very busy:

    Most packages

    Click for the rest of the much larger graph as an SVG with clickable package names, sized according to the number of dependent packages. You may wish to zoom out to get your bearings.

    That’s not even the complete package graph: development dependencies are completely ignored (they can cause cycles), and any crates with no dependencies and no dependent crates are not shown, since they’re not yet interacting with the ecosystem at all; but even so, the graph is fairly useless.

    The suck is mainly due to the most popular crates like time and rustc-serialize, which pull most clusters into the very center of the graph. Eliminating them (specifically, crates with 15 or more dependent crates) gives a more reasonable graph.

    Fewer packages

    (Click for bigger.)

    That graph makes it clearer that there’s a few distinct clusters. The left has a lot of web-development functionality, clustered around hyper, conduit and openssl. The right has a lot of game-development and computer graphics libraries, with many components (that’s not all of them) from Piston and many from RustAllegro. Spread around are smaller clusters, like epsilonz, and a variety of numerical projects (of which some use num, and others do not).

    It’s not a cluster so much, but there are a lot of examples of some crate $foo depending on $foo-sys: people following the convention for publishing FFI bindings.

    Collecting the data

    crates.io uses a git repo for distributing information about the registered crates. Each one gets a file containing a series of JSON objects (one per line) looking a lot like:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    {
       "cksum" : "665e3764d2f654d77382ec6ed40a2faf5a114a6e41e2d1c307ff97916924ec64",
       "deps" : [
          {
             "default_features" : true,
             "kind" : "normal",
             "name" : "num",
             "optional" : false,
             "target" : null,
             "req" : "~0",
             "features" : [
                ""
             ]
          }
       ],
       "vers" : "0.1.5",
       "yanked" : false,
       "name" : "slow_primes",
       "features" : {}
    }
    

    That’s the info for version 0.1.5 of slow_primes; it contains the key piece of information0 that we need: the dependencies, in the deps field. The simplistic analysis I’m doing here means that the only facts of interest are the name of the dependency and whether it is a dev-dependency (kind == "dev").

    The fixed format of the JSON makes it ameniable to #[derive(RustcDecodable)], an attribute that will automatically create deserialization code that does the right thing:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    #[derive(RustcDecodable)]
    struct CrateInfo {
        name: String,
        vers: String,
        deps: Vec<DepInfo>,
        cksum: String,
        features: HashMap<String, Vec<String>>,
        yanked: bool,
    }
    
    #[derive(RustcDecodable)]
    struct DepInfo {
        name: String,
        req: String,
        features: Vec<String>,
        optional: bool,
        default_features: bool,
        target: Option<String>,
        kind: Option<String>
    }
    

    The graph is based on the most recent version of each package, so I just take the last line in each crate’s file, run it through json::decode and get back a CrateInfo. A few tens of lines later, the code knows about every crate and about every dependency link and can print it all out to a graphviz DOT file. (Unfortunately the neat graphviz library doesn’t offer the flexibility I wanted for setting arbitrary attributes, so I had to resort to manual printing.)

    There is one trip-up: a crate can depend on another crate multiple times, with different configurations (most commonly, differing targets), so some deduplication is required to avoid double counting and cluttering the graph with multiple lines. Other than that, the details of the implementation aren’t very interesting, but the code is publicly available at github.com/huonw/crates.io-graph.

    Thanks to cmr, acrichto, FreeFall and tomaka in #cargo for help/suggestions/copy-editing/catching bugs.

    Comments:
    1. Before FreeFall pointed this out, I was considering downloading everything on crates.io to construct the graph, which would’ve been pretty fun too! 

    I'm Huon Wilson huon_w, a mathematically and statistically inclined software engineer, currently working on the Swift team at Apple, but interested from hearing from you. Before that I was a long-term volunteer on Rust's core team.

    Latest posts