Ownership in Rust

Isak Østrem Hellemo

Before you read

There already exists a lot of really good resources on the basics of rust, especially the rust-book.

This is mainly a synthesised version to help myself understand the concept better, however I'm sharing it in case others find it useful.

The "why" of ownership.

1) Allocating memory

A hard problem when writing software is managing memory. Allocating memory for new data, and freeing it when we're done.

For instance, in C we could do something like this:

int main(int argc, char **argv) {

    int N = 10;
    int *arr = malloc(sizeof(int) * 10);

    /* Do stuff with the array*/
    solve_problems(arr, N);

    free(arr);
}

This is incredibly easy to mess up. The main problem is that we as developers have to remember who is responsible for freeing the memory. Is it the main-function which allocated the memory? Or should solve_problems do it since they know when the work is complete.

Another approach is using a garbage-collector. The garbage-collector can keep track of the memory, and free it when the data is no longer used. This stops most of the easy mistakes and works well in most cases. However it does add some overhead tracking the memory-allocations.

2) Sharing memory between threads

Another big problem is avoiding data-races in multi-threaded programs. How do we avoid two threads mutating the same data simultaneously?

For instance, in Go we can do:

package main

import (
  "fmt"
  "time"
)

func inc(a *int) {
  for i := 0; i < 1000000; i++ {
    *a++
  }
  fmt.Println("a: ", *a)
}

func main() {
  a := 0
  // Start two counter-threads
  go inc(&a)
  go inc(&a)

  // Wait for a to be the final result
  for a < 2000000 {
    time.Sleep(1 * time.Second)
  }

  fmt.Println("Done")
}

I ran this a couple of times locally, and it printed:

a:  1006621
a:  1038769

It's obvious why this happened. Multiple threads mutated the same value simultaneously, causing data-races. Just like with memory-management, this can easily be solved by just being smart and not making mistakes. In this case by using a mutex. Go does have tools like the Data Race Detector which can detect problems during runtime. However, it would be nice to prevent them during compile-time.

How it works

The core idea is to assign an owner for each variable. Each value has only one owner, which controls two things. First, the owner controls when the memory is freed (solving problem 1). Second, it controls who can have a reference to the value (solving problem 2).

Lets first look at some examples of ownership for memory-management:

struct Point {
    x: i64,
    y: i64,
}

fn do_stuff() {
    // p has ownership of the point
    let p = Point {x: 1, y: 2};


    // p goes out of scope, and frees the memory.
}

Giving ownership of value to a new function:

fn do_stuff() {
    // P has ownership
    let p = Point {x: 1, y: 2};


    // we give ownership of p to do_calculations
    do_calculation(p);

    // we no longer own p, so this will cause a compilation-error.
    println("point ({}, {})", p.x, p.y);

    // p is not freed, because do_calculation has ownership
}


fn do_calculation(p: Point) {
    // do_calculation takes ownership of p
    let n = p.x * p.y;
    println!("n = {}", n);

    // p goes out of scope, freeing the memory
}

A function can also give back ownership after completion:

fn do_stuff() {
    // P has ownership
    let p = Point {x: 1, y: 2};

    // we give ownership of p to do_calculations, which gives ownership back afterwards.
    let p = do_calculation(p);

    println!("point ({}, {})", p.x, p.y);
    // p goes out of scope, and is freed
}

fn do_calculation(p: Point) -> Point {
    // do_calculation takes ownership of p
    let n = p.x * p.y;
    println!("n = {}", n);

    // we return p, giving ownership back
    p
}

This approach lets the compiler know when to free the memory without the overhead of a garbage-collector, and without having small mistakes cause memory-leaks.

Now, ownership is cool. However it would be really cumbersome if we had to pass ownership around all the time. Especially since there can always only be a single owner. To help with this, rust allows Borrowing.

Borrowing

Rust allows functions to borrow (take references to) data. This allows multiple parts of the code to reference the same data without having to pass ownership around. However, rust enforces some rules to avoid problem 2).

Rust has two types of references. Immutable references & and mutable references &mut. The compiler enforces that a &mut can only exist if there are no immutable references, and vice versa. In other words, the rust-compiler enforces that the Go-program shown above can never compile.

So how does rust enforce this? One way to look at it is that rust looks at the "flow" of your program, making sure that a mutable and immutable reference to the same value at the same time never happens.

Lets look at some examples:

Using a mutable and immutable reference at the same time:

fn do_stuff() {
  // x is mutable
  let mut x = vec![1, 2, 3];

  // y has an immutable reference to x
  let y = &x;

  // push uses a mutable reference to x.
  // This fails because y already has an immutable reference.
  x.push(1);

  println!("{:?}", y);
}

Using immutable and mutable references to the same value, but not at the same time:

fn do_stuff() {
  // x is mutable
  let mut x = vec![1, 2, 3];

  // push uses a mutable reference to x.
  x.push(1);

  // y has an immutable reference to x
  let y = &x;

  // takes an immutable reference to x
  do_other_stuff(&x);

  // At this point we can NOT use a mutable reference to x,
  // since it will conflict with y.

  println!("{:?}", y);

  // Now we're no longer using y, so we can use a mutable reference to x again.
  // Note: y is still in scope, but it is no longer used in the code, so this is still ok.
  x.push(2);
}

We can create quite complicated flows, but the compiler will always check that the guarantees hold. For instance:

fn if_flow() {
  let a = true;

  // Mutable x
  let mut x = 10;
  // immutable reference to x
  let y = &x;

  if a {
    // In this branch, y is not used, x has exclusive mutable access.
    x += 2;
  } else {
    // In the else-clause, nothing is mutating x, so the guarantees hold.
    println!("y: {}", y);
  }

  // This would not compile since one of the branches would mutate x.
  // println!("y: {}", y);
}

So what happens if we try to copy the go-program in rust? Well, we can't.


fn inc(a: &mut i32) {
  for _ in 0..1000000 {
    *a += 1;
  }
  println!("a: {}", a);
}

fn main() {
  let mut a = 0;

  thread::scope(|s| {
    s.spawn(|| inc(&mut a));
    // Error: cannot borrow `a` as mutable more than once at a time
    // s.spawn(|| inc(&mut a));
  });

  println!("{}", a);
}

How do we do it in rust? By using a mutex. We should have used a mutex in the go-version as well, however the rust-compiler forces us to. The mutex is a magical type in rust which lets you mutate data behind a immutable reference, which means that multiple threads can borrow it the same data(since it's "immutable").

fn inc_mutex(a: &Mutex<i32>) {
  for _ in 0..1000000 {
    *a.lock().unwrap() += 1;
  }
  println!("a: {}", *a.lock().unwrap());
}


fn main() {
  let a = Mutex::new(0);
  thread::scope(|s| {
    s.spawn(|| inc_mutex(&a));
    s.spawn(|| inc_mutex(&a));
  });

  println!("{}", *a.lock().unwrap());
}

Just for completeness. We can force rust to do it like in go. However, we have to use the unsafe-keyword which tells the compiler to not enforce its ownership-rules. Interestingly, the mutex-implementation uses unsafe specifically to tell the rust-compiler to allow mutation through immutable references. This means that the only place we could make a mistake is in the mutex-implementation, and not in the rest of the code-base.

fn main() {
  let mut a = 0;

  thread::scope(|s| {
    let pointer = &mut a;
    let pointer2: &mut i32;
    unsafe {
      pointer2 = &mut *(pointer as *mut i32);
    }
    s.spawn(|| inc(pointer));
    s.spawn(|| inc(pointer2));
  });

  println!("{}", a);
}

Some final notes

I'm used to thinking about parameters as being pass-by-reference vs pass-by-value when passing data between functions. By that I mean whether or not the value is copied when passing it into a function, or if we're just passing a pointer to the value. From what I gather, this is NOT the way we think about it in rust. We do not pass values/references.

Instead, we are passing responsibilities and guarantees to the functions. By passing ownership we are passing the responsibility of freeing the data. When passing a mutable reference, we are passing the guarantee that they have exclusive access to the value. When passing ownership we might, but not necessarily copy the data.