Rreverrse Debugging
Imagine being able to step forward and backwards as code runs in your debugger. Imagine being able to do an test run multiple times with exactly the same sequence of instructions and values, right down to memory addresses and IO. Imagine being able to run an executable thousands of times and then do all that in the one execution that triggers the rare bug that’s draining you of life…
The rr tool is amazing.
A Debugger Skeptic
I’ve never been a huge user of debuggers. Being able to diagnose my code line-by-line, statement-by-statement sounded theoretically good to me, but I’ve never really “clicked” with it in practice.
Of course, there’s an element of a vicious cycle: I only use debuggers (generally gdb for native code, and pdb for Python) occasionally and so I’m not an expert, so things end up being annoying, slow and require searching the internet a lot, and this discourages me from using them. And, for anything non-trivial, circumstances often mean that my attempts to use a debugger are an exercise in pulling teeth and are slower than just staring at the code harder and/or adding more logging.
There’s also a large element of just never having a tool I liked using (which is what this whole post is about): I don’t do that much in managed languages like Java or C#, and I’ve not used Visual Studio in any detail, all of which I’ve heard rumours about being top-notch.
The thing I struggle with most is the disconnect between when problems
are detected/manifest, and the fundamental cause. A typical “tricky”
bug might only occur occasionally: an assertion of
internal-consistency inside a library triggers inside a function in a
somewhat non-deterministic way, meaning it may take hundreds or even
thousands of calls before it triggers. And, even when it does, the
assertion is usually just detecting some flow-on consequence, and the
actual problem is usually some unknown distance before the
assertion. All this means I struggle to wrangle the debugger into the
right position to catch the bug, without having to walk through too
many perfectly fine executions. I’m sure a gdb-empress could handle
this with ease, but I don’t have those skills, so just sticking some
println!
s in my code and reading the log backwards from the error is
easier.
Recently I’ve been trying to get better at using debuggers, to break
my vicious cycle. I’ve been poking around in my Python code for my
Masters’ with pdb, and I’ve of course been poking around in Rust code
with gdb. This has been an uphill struggle: I have to consciously
decide to actually open a debugger and then work out exactly where I
need to be. There was a boost when I worked out that
the rust-gdb
wrapper was made to be used (and is
conveniently installed by default, even with
multirust
), but, still: not my favourite activity.
And then, rr 4.0 was released…
On a rrrroll
rr is everything I never knew I wanted from a debugger. I can barely stay out of it: I’m slicing and dicing bugs in Rust code like never before. Instead of having to plan out the best places to put breakpoints for what I suspect the issue might be, I just run the code, and then break whereever/whenever I want to later.
The workflow is simple: make a recording of an execution, and then replay it in a debugger, with the ability to do go anywhere you want any time.
With Rust/Cargo, my command line will often look like:
1
2
3
4
5
6
7
8
9
10
11
$ cargo build --example foo
... things compiling ...
$ rr ./target/debug/examples/foo
... rr & program output ...
$ rr replay -d rust-gdb
... rr & gdb start-up ...
(gdb) break rust_panic
...
(gdb) continue
... program output ...
(gdb)
(The -d
flag was added just after 4.0 was released. I asked
a question about swapping in different gdbs, and was prompted to
file an issue, which was fixed in just a few hours! Anyway, this means
getting the best Rust experience requires building from rr
from source at the moment.)
Recording
The recording step with rr
is simple: tell it your binary and it’ll
run it, saving the state it needs for replaying as the program
executes.
This could be implemented as just saving the entire state of the
machine (memory and registers) between each step, but this would be
super-slow, and it won’t be nearly as nice to use as rr. The worst
overhead I’ve noticed is the rr’d program taking 14× longer, but it’s
a pretty dumb program, and I only came up with it for this blog post0:
1
2
3
4
5
6
7
8
9
10
// hammer.rs
use std::fs::File;
use std::io::prelude::*;
fn main() {
for _ in 0..100000 {
let mut buffer = [0; 1000];
File::open("foo.txt").unwrap().read(&mut buffer).unwrap();
}
}
It takes 4.2s 0.5s (this case was optimised) to be
recorded with rr, but only 0.3s to run normally. (Compiled with
rustc -g hammer.rs
, with Rust 1.3.0.) Of course, most programs will
being doing more than reading from a file a lot, and the overhead is
much smaller for more typical work-loads. For instance, I’ve been
doing some work with Aatch’s big-integer library, ramp, and
the factorial
example (compiled in debug mode) takes
1.8s to run under rr, and 1.5s normally.
Determinism
The replaying is where the magic really kicks in for me. The rr
replay
command brings up an instance of gdb with the trace loaded
and ready to go.
The recording means that the “execution” of a replay is now deterministic (and completely so, as far as I can tell), so you can be sure about a lot of things. Reading random data is fine:
1
2
3
4
5
6
7
8
9
10
// urandom.rs
use std::fs::File;
use std::io::prelude::*;
fn main() {
let mut buffer = [0; 5];
File::open("/dev/urandom").unwrap().read(&mut buffer).unwrap();
println!("{:?}", buffer);
}
Recording and playing it back gives the same result each time:
1
2
3
4
5
6
7
8
9
$ rr ./urandom
rr: Saving the execution of `./urandom' to trace directory `/home/huon/.rr/urandom-2'.
[45, 45, 28, 0, 236]
$ ~/projects/mozilla/rr/obj/bin/rr replay -d rust-gdb ~/.rr/urandom-2
GNU gdb (Ubuntu 7.10-1ubuntu2) 7.10
...
(gdb) c
Continuing.
[45, 45, 28, 0, 236]
Even the memory addresses of allocations, the stack and everything else are deterministic, so watches and watchpoints can be placed on exact bytes in memory and they’ll do the right thing on every “run” of the program. (I’ve not had a reason to use it yet, but seems awesome for debugging memory corruption: find an execution that exhibits corruption of some string or something, and place a watch point on those bytes to find the locations that modify the string.)
Back to the future
The thing I’ve really fallen in love with is reverse debugging:
instead of just being able to let the program’s execution progress
forward in various forms (step
, next
, continue
etc.), you can do
the same in reverse.
My strategy is:
- find out some code has a bug (oh no!),
- record executions with
rr
until the bug exhibits how I want1, - replay the execution until there’s a sure-fire indicator it occurred
(for instance, in Rust code, breaking on
rust_panic
will stop execution at any panic, including assertion failures), - work backwards from that point, diving in and out of functions and stepping forward and backwards over lines/instructions/anything.
The keys here are the r
/reverse-
-prefixed gdb commands: rn
(reverse-next
), rs
(reverse-step
), rc
(reverse-continue
) and
so on. These do what their non-reverse
conterparts do, except
backwards: next
goes from line 9 to line 10, reverse-next
goes
from line 10 to line 9.
It just feels so great to start up the debugger, break on
rust_panic
, examine the state of the world, and then break on some
previous function call and jump back to it with
reverse-continue
. I can track down problems without having to fiddle
with setting up how to actually start the debugger in the right place.
Removing the rose-coloured glasses
I’m probably sounding breathlessly enthusiastic, as if record/replay and reverse execution was some amazing new functionally. It’s not.
Other environments/languages/tools have the functionality, but it’s
the first time I’ve had the luck to use it. There’s even actually
recording and reverse execution in gdb itself,
but it is (apparently) quite slow, and seems to be less reliable: I
tried target record
on an Rust executable, and the recording
immediately failed in glibc’s memset
(it couldn’t handle an AVX2
instruction).
Also, rr of course has its own limitations:
- it doesn’t literally fix your bugs for you,
- it only runs on Linux and only x86 & x86-64 (although there’s a bug about ARM),
- programs run on a single-core, so concurrency is possible but not parallelism (and hence programs may be much slower due to that, and some bugs for which rr might be nice to have are harder/impossible to trigger),
- modifying variables/memory is useless: any changes are ignored and overwritten, since the executions are recorded and fixed,
- it doesn’t support all syscalls, but I’ve not encountered one that isn’t supported personally, and the reason is they’re implemented on an on-demand basis; things that have been founded to be needed for debugging Firefox etc.,
- the time overhead is fairly low, but the memory overhead is quite large: the factorial example from before went from 3MB (as measured by GNU time) to 82MB. I’m sure that a large part of this is a constant overhead, because it doesn’t seem like it’d be a great tool for debugging large applications (for which it is designed, and used) with 30× increase in memory use.
(There’s a few more technical things mentioned in the limitations on rr’s website.)
Summary
Reverse debugging has converted me from “meh debuggers” to “💜 rr”: being able to dance around freely—frolic, even—in code pleases me like no other tool. It is language agnostic, I’m led to believe that anything that works in GDB itself will work with rr: I believe it was designed with C/C++ applications like Firefox in mind, but it works flawlessly with Rust, and I’m sure other languages too.
- users
- /r/rust
- /r/programming
-
I know pretty much nothing about the internals of rr or Linux, so I have no idea if this is actually triggering a particularly bad case; but I do know that rr will shim in/save the result of syscalls into the operating system, so doing a lot of them (as opening/reading/closing a file will do) is probably slow. ↩
-
The rr developers point out this means tracking down rare bugs is made easier: run rr on a test-case in a loop until the bug triggers, and then you can dissect it at leisure, instead of just hoping to catch it by chance in a traditional debugger. (Again not something I’ve particularly needed to use yet, so I don’t speak from experience: the rarest bug I’ve had to tackle recently occurred in about a quarter of executions.) ↩