Error handling in Rust: a k-NN case study

By Huon Wilson — Published 11 Jun 2014

Contents

    After posting a Rust translation of some k-nearest neighbour code, I got a few comments asking “how would you handle errors if you wanted to?”. This is the perfect chance to briefly demonstrate a few idioms.

    See my previous post for context and the original code; like with the code in that post, this code compiles with rustc 0.11.0-pre-nightly (e55f64f 2014-06-09 01:11:58 -0700)

    What type of error handling to use?

    The “canonical” way is to use type system, with types like Result<A, B>, which can either be an Ok containing a value of type A or an Err containing a B (isomorphic to Haskell’s Either), and Option<T>, which is either a Some containing a T, or just None containing no data.

    The standard library uses Result and Option pervasively, meaning you can essentially be guaranteed to handle all errors (and theoretically never crash) as long as you avoid calling unwrap and the small number of similar methods. For example, almost all IO actions return an IoResult<...>, which defines the possible errors via the IoError type and its contained IoErrorKind enum.

    An approximation of monadic short-circuiting is provided for Result by the try! macro, which just returns immediately if an error occurs (that is, if variable with type Result is an Err), propagating it upwards for the caller to handle. However, try! isn’t the only strategy, and it’s easy to define custom handlers, as I do below.

    Before you ask: Rust lacks conventional exceptions (since these are hard to make memory safe without a garbage collector, as I understand it); in safe code/by default, unwinding can only be stopped at task boundaries.

    The bullet-proof code

    The try! macro and IoError form my inspiration for the code to handle errors in slurp_file: define an enum with the various failure conditions, and define a short custom macro that either “unwrap”s or short-circuits to return the appropriate error marker.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    #![feature(macro_rules)]
    
    // the possible things that can go wrong
    enum SlurpError {
        // The input was malformed (could have line/column number info too)
        InvalidInput,
        // Some piece of IO failed (e.g. couldn't open a file)
        FailedIo(IoError)
    }
    
    // short-circuiting macro: "unwrap" the value, an error will return
    // from the surrounding function propagating that error upwards.
    //
    // macro_rules! macros take a sequence of tokens/AST non-terminals and
    // attempt to pattern match on them, taking the first branch that
    // fits, and then effectively replace the macro invocation with the
    // right-hand side of that branch.
    macro_rules! try_slurp {
        // `$...` is a special variable, a "macro argument", this `value`
        // is an expression, e.g. `1 + 2`, `foo()`, `if ... { ... } else {
        // ... }`. Hence, to match, this arm needs to be passed an
        // expression, a comma, then a literal `FailedIo`.
        // e.g. `try_slurp!(foo(), FailedIo)`
        ($value: expr, FailedIo) => {
            // The macro expands to this code if the pattern matches. The
            // `$value` expression is `match`d as a `Result<..., IoError>`
            // (passing in something else will be a type error, after the
            // macro expands). Success is unwrapped, and a failure is
            // returned from the function/closure in which the macro is
            // called.
            match $value {
                Ok(x) => x,
                Err(e) => return Err(FailedIo(e))
            }
        };
    
        // similarly here, this branch takes an arbitrary expression and
        // then has to match exactly `InvalidInput`
        ($value: expr, InvalidInput) => {
            // as above, except handling `$value` as an `Option<...>`.
            match $value {
                Some(x) => x,
                None => return Err(InvalidInput)
            }
        }
    }
    
    fn slurp_file(file: &Path) -> Result<Vec<LabelPixel>, SlurpError> {
        use std::{result, option};
        let mut file = BufferedReader::new(try_slurp!(File::open(file), FailedIo));
    
        let lines = file.lines()
            .skip(1)
            .map(|line| {
                let line = try_slurp!(line, FailedIo);
    
                let mut splits = line.as_slice().trim().split(',').map(|x| from_str(x));
    
                // .and_then is flattening Option<Option<int>> to Option<int>.
                let label = try_slurp!(splits.next().and_then(|x| x), InvalidInput);
    
                let pixels = try_slurp!(option::collect(splits), InvalidInput);
    
                Ok(LabelPixel {
                    label: label,
                    pixels: pixels
                })
            });
    
        result::collect(lines)
    }
    

    vbhit provides a nice alternative implementation that avoids defining the macro.

    The return value can then be pattern-matched where-ever slurp_file is called, and the error propagated upwards there, or handled appropriately e.g. the slurp_file calls in main could be changed to something like:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    let training_set = match slurp_file(&Path::new("trainingsample.csv")) {
        Ok(data) => data,
        Err(e) => {
            match e {
                FailedIo(io) => println!("Couldn't read file: {}", io),
                InvalidInput => println!("Invalid file format")
            }
            std::os::set_exit_status(1);
            return
        }
    };
    

    “collect”?

    The two collect functions (result and option) are useful helpers, which take an Iterator<Result<T, X>> and return something like Result<Vec<T>, X> or Result<HashSet<T>, X> (respectively Option). It’s not a coincidence that these functions share the same name as the Iterator.collect I used last time: all of them allow collecting to the same set of generic container types.

    A Haskeller might notice that they are actually just special cases of the monadic sequence; there is a possibility that this could be handled generically in future (with higher kinded types), but, unfortunately, there is no guarantee that a Monad trait will work nicely due to Rust’s affine types and various low-level details.

    I'm Huon Wilson huon_w, a mathematically and statistically inclined software engineer, currently working on the Swift team at Apple, but interested from hearing from you. Before that I was a long-term volunteer on Rust's core team.

    Latest posts