What does Rust's “unsafe” mean?
Rust is an in-development0 systems
programming language with a strong focus on no-overhead memory
safety. This is achieved through a powerful type system (with
similarities to Haskell), and careful tracking of ownership and
pointers, guaranteeing safety. However, this is too restrictive for a
low-level systems language, an escape hatch is occasionally
required. Enter the unsafe
keyword.
Poking holes in memory safety
Rust aims to be memory safe, so that, by default, code cannot crash (or be exploited) due to dangling pointers or iterator invalidation. However, there are things that cannot fit into the type system, for example, it is not possible to get the raw interactions with the operating system and system libraries (like memory allocators and thread spawning) to be truly safe. Detailed human knowledge about how to use them safely is required to be encoded at some point, and this is not easily checkable by a compiler: mistakes can be made.
Other memory safe languages (e.g. managed ones like Python or Haskell) have all this knowledge encoded in the implementations of their underlying virtual machines/runtime systems, usually written in C. Rust doesn’t have a heavy-weight VM or runtime, but still needs to provide (preferably safe) interfaces in some manner.
Rust fills these holes with the unsafe
keyword, which opts in to
possibly dangerous behaviour; like calling into the operating system
and external libraries via
the foreign function interface (FFI),
or handling possibly-invalid machine pointers directly.
Rust uses unsafe
to build all the abstractions seen in the standard
library: the vast majority of it is written in Rust, including
fundamental types like
the reference counted Rc
,
the dynamic vector Vec
,
and
HashMap
,
with only
a few small C shims
and some external non-Rust libraries like jemalloc and libuv.
unsafe
There are two ways in which one can opt-in to these possibly dangerous
behaviours: with an unsafe
block, or with an unsafe
function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
// calling some C functions imported via FFI:
unsafe fn foo() {
some_c_function();
}
fn bar() {
unsafe {
another_c_function();
}
}
fn baz() {
// illegal, not inside an `unsafe` context
// yet_another_c_function();
}
Being inside an unsafe
context allows one to (not necessarily
complete):
- call functions marked
unsafe
(this includes FFI functions) - dereference raw pointers (the
*const
and*mut
types), which can possibly beNULL
, or otherwise invalid - access a mutable global variable
- use inline assembly
All of these can easily cause large problems. For example, a shared
reference &T
is a machine pointer, but it must always point to a
valid value of type T
; all four of the above can cause this to be
violated:
-
There is an
unsafe
functionstd::mem::transmute
which takes the bytes of its argument and pretends they are of any type one wants, thus, one can create an invalid&
pointer by reinterpreting an integer:transmute::<uint, &Vec<int>>(0)
. -
A raw pointer
p: *const T
can legally beNULL
. The “rereferencing” operation&*p
creates a reference&T
pointing top
s data, a no-op at runtime, since*const T
and&T
are both just a single pointer under the hood. Ifp
isNULL
this allows one to create aNULL
&T
: invalid! -
If one has
static mut X: Option<i64> = Some(1234);
, one can use pattern matching to get a referencer: &i64
pointing to the1234
integer, but another thread can overwriteX
withNone
, leavingr
dangling. -
Inline assembly can set arbitrary registers to arbitrary values, including setting a register meant to be holding a
&T
to zero.
What does unsafe
really mean?
An unsafe
context is the programmer telling the compiler that the
code is guaranteed to be safe due to invariants impossible to express
in the type system, and that it satisfies
the invariants that Rust itself imposes.
These invariants are assumed to never be broken, even inside unsafe
code blocks, and the compiler compiles and optimises with this
assumption. Thus, breaking any of those invariants is
undefined behaviour1
and can leave a program doing “anything”, even making
demons fly out your nose.
That is, an unsafe
context is not a free pass to mutate anything and
everything, nor is it a free pass to mangle pointers and alias
references: all the normal rules of Rust still apply, the compiler is
just giving the programmer more power, at the expense of leaving it up
to the programmer to ensure everything is safe.
A non-unsafe
function using unsafe
internally should be
implemented to be safe to call; that is, there is no circumstance or
set of arguments that can make the function violate
any invariants. If
there are such circumstances, it should be marked unsafe
.
This rule is most important for public, exported functions; private
functions are guaranteed to only be called in a limited set of
configurations (since all calls are in the crate/module in which it is
defined), so the author has more flexibility about what sort of safety
guarantees they give. However, marking possibly-dangerous things
unsafe
helps the compiler help the programmer do the right thing, so
is encouraged even for private items.
Case study: Vec
The Vec<T>
type is
defined
as:
1
2
3
4
5
pub struct Vec<T> {
len: uint,
cap: uint,
ptr: *mut T
}
There are (at least) two invariants here:
ptr
holds an allocation with enough space forcap
values of typeT
- That allocation holds
len
valid values of typeT
(i.e. the firstlen
out ofcap
of theT
s are valid, implyinglen <= cap
)
It’s not feasible to express these in Rust’s type system, so they are
guaranteed by a careful implementation. The implementation is then
forced to use unsafe
to assuage the compiler’s doubts about certain
operations. The compiler does not and cannot understand the invariants
stated above, and so cannot be sure that
creating a slice view into the vector
is safe. It is implemented like so:
1
2
3
fn as_slice<'a>(&'a self) -> &'a [T] {
unsafe { mem::transmute(Slice { data: self.as_ptr(), len: self.len }) }
}
And you can see that it could easily be unsafe e.g. if one
accidentally wrote self.cap
instead of self.len
, the resulting
slice would be too long and the last elements of it would be
uninitialised data. The compiler can’t verify that this is correct,
and so assumes the worst, disallowing it without the explicit opt-in.
Another thing to note is these Vec
invariants are required to always
hold or else Vec
will be allowing incorrect behaviour to happen via
the safe methods it exposes (e.g. if someone could increase len
without initialising the elements appropriately, the as_slice
method
above would be broken).
Unfortunately, it’s not possible to get the Rust compiler to directly
enforce them, so the Vec
API has to be careful to guarantee that
they can’t be violated; part of this is keeping the fields private, so
they cannot be directly changed, another part is being careful to mark
the unsafe
parts of the API as unsafe
, e.g.
the set_len
method
can directly change the len
field.
Case study: malloc
The C function malloc
is described by my man page as the following:
The
malloc()
function allocates size bytes and returns a pointer to the allocated memory. The memory is not initialized. If size is 0, thenmalloc()
returns eitherNULL
, or a unique pointer value that can later be successfully passed tofree()
.The
malloc()
andcalloc()
functions return a pointer to the allocated memory, which is suitably aligned for any built-in type. On error, these functions returnNULL
. […].
The libc
crate predefines most of the common symbols from the libc
on various platforms, including libc::malloc
. Let’s write a safe
program that creates memory for, stores and prints an 8-byte i64
integer, carefully justifying why we know more than the compiler, and
thus why each unsafe
is safe (in a perfect world all unsafe
blocks
would be justified/proved correct).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
extern crate libc;
use std::ptr;
fn main() {
let pointer: *mut i64 = unsafe {
// rustc doesn't know what `malloc` does, and so doesn't know
// that calling it with argument 8 is always safe; but we do,
// so we override the compiler's concern with
// `unsafe`. (`malloc` returns a `*mut libc::c_void` so we
// need to cast it to the type we want.)
libc::malloc(8) as *mut i64
};
// we know that the only failure condition is the pointer being
// NULL, in any other circumstance the pointer points to a valid
// memory allocation of at least 8 bytes.
if pointer.is_null() {
println!("could not allocate");
} else {
// here, the only thing missing is initialisation, the memory
// is valid but uninitialised, so lets fix that. Since it is
// not initialised, we have to be careful to avoid running
// destructors on the old memory; via `std::ptr::write`.
unsafe {
// allocation is valid, and the memory is uninitialised,
// so this is safe and correct.
ptr::write(pointer, 1234i64);
}
// now `pointer` is looking at initialised, valid memory, so
// it is valid to read from it, and to obtain a reference to
// it.
let data: &i64 = unsafe { &*pointer };
println!("The data is {}", *data);
// prints: The data is 1234
}
// (leaking memory is not `unsafe`.)
}
(Keen eyes will note that i64
doesn’t have a destructor and so the
ptr::write
call isn’t strictly required, but it’s good practice.)
FAQ: Why isn’t unsafe
viral?
One might expect a function containing an unsafe
block to be
unsafe
to call, that is, unsafe
ty infects everything it touches,
similar to how Haskell forces one to mark all impure calculations with
the IO
type.
However, this is not the case, unsafe
is just an implementation
detail; if a safe function uses unsafe
internally, it just means the
author has been forced to step around the type system, but still
exposes a safe interface.
More pragmatically, if unsafe
were viral, every Rust program ever
would be entirely unsafe
, since the whole standard library is
written in Rust, built on top of unsafe
internals.
Conclusion
The unsafe
marker is a way to step around Rust’s type system; by
telling rustc
that there are external conditions/invariants that
guarantee correctness: the compiler steps back and locally leaves the
programmer to verify that
various properties
hold. This allows Rust to write very low-level code like C, but still
be memory safe by default, by forcing programmers to opt-in to the
risky behaviour.
The
“Writing Safe Unsafe and Low-Level Code”
provides guidance and tips about using unsafe
correctly.
- /r/rust
- /r/programming
-
The code in this post compiles with
rustc 0.12.0-pre-nightly (aa0e35bc6 2014-07-22 00:26:21 +0000)
. ↩ -
“What Every C Programmer Should Know About Undefined Behaviour” is a series of articles highlighting how insidious undefined behaviour can be, leading to subtly (or not so subtly) broken programs. ↩