πŸ“©Borrowing & References

Memory Management Concepts In Rust

What are Borrowing & References in Rust?

We can imagine borrowing in Rust as we borrow a book from a library, we can use it for a while, but we eventually return it to the owner.

Rust uses borrowing to improve performance and in the case where ownership is not needed or desired.

In summary:

Borrowing:

  • The temporary access to a value without taking ownership of it.

  • In Rust, borrowing is achieved through references.

References:

  • A reference (&T) is a variable that points to an existing value of another type (T).

  • It acts like an address that tells your code where to find the data.

  • You can't modify the borrowed data through a reference by default (immutable borrows).

  • There are also mutable references (&mut T) that allow modification of the borrowed data but with restrictions on how they are used to ensure safety.

For example:

fn print_number(x: &i32) { // x is a reference to an i32
  println!("Number: {}", x);
}

fn main() {
  let num = 5;
  print_number(&num); // Pass a reference to the function
}

Ownership and Borrowing

Ownership and borrowing are intertwined. You don't take ownership when you borrow a value, so the original owner can still use the data.

However, there are restrictions on borrowing based on ownership rules. For example, you can't simultaneously have multiple mutable borrows to the same data.

Benefits of Borrowing

Efficiency: Avoids unnecessary copying of data, especially for large structures.

Immutability: Encourages a functional programming style by promoting functions that don't modify their arguments.

Multiple Access: Allows multiple parts of your code to access the same data concurrently (with limitations for mutable borrows).

Borrowing Rules

Mutability

There are restrictions on mutable borrows:

You can only have one mutable borrow to a specific piece of data at a time.

No immutable borrows can exist while a mutable borrow is outstanding.

// Can't borrow `x` as mutable more than once at a time
fn main() {
    let mut x = 5;

    let ref_mut_1 = &mut x; // Mutable reference
    let ref_mut_2 = &mut x; // Mutable reference - this will cause an error

    println!("ref_mut_1: {}", ref_mut_1);
    println!("ref_mut_2: {}", ref_mut_2);
}
// Can't borrow `x` as immutable because it is also borrowed as mutable
fn main() {
    let mut x = 5;

    let ref_mut = &mut x; // Mutable reference
    let ref_immut = &x; // Immutable reference - this will cause an error

    println!("ref_mut: {}", ref_mut);
    println!("ref_immut: {}", ref_immut);
}

The Lifetimes

Concrete Lifetimes

In this example, the concrete life is tied to the scope of the main function.

The concrete lifetime of the reference hello is equal to the lifetime of the string value stored in s. Both hello and s will be valid as long as the string exists within the main function’s scope.

fn main() {
    let s = String::from("Hello, world!");
    let hello = &s; // Reference to s
    println!("{hello}");
}

Generic Lifetimes

In this example, the longest function will either return x or y based on the length condition, and the string x & y could have different lifetimes, hence the Rust compiler doesn’t know the lifetimes of the return reference in this case.

To make sure that both references live long enough for the function to return a valid reference, we have to specify the lifetimes annotation which is β€˜a to describe the relationship between lifetimes of references. The lifetime 'a is used to indicate that both references (x and y) must live for the same duration (as long as the function itself). This ensures the borrowed data (the string content) is valid throughout the function's execution.

In this case, we’re trying to tell Rust compiler that there’s a relationship between x and y and return value, it means the lifetime of the return value is going to be equal to the shortest lifetime passed in either x or y .

In the main function, we call longest with references to both strings. Since the function promises to return a reference with the same lifetime ('a), the lifetimes of both arguments (string1 and string2) must be at least as long as the function's lifetime ('a). This ensures the borrowed data remains valid until the returned reference goes out of scope.

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

fn main() {
    let string1 = String::from("apple");
    let string2 = String::from("banana");
    let result = longest(&string1, &string2);
    println!("Longest string: {}", result);
}

Lifetime Elision

Do we always need to define the lifetime for every code we write? The answer is NO. After working with Rust for a while, the Rust team found predictable patterns when dealing with lifetime, therefore, they decided to program those patterns into the compiler, hence, the borrow checker could infer the lifetimes for us without explicit annotations. This feature is called Lifetime Elision.

Lifetimes on function or method parameters are called input lifetimes, and lifetimes on return values are called output lifetimes.

How Lifetime Elision Works:

  • The Rust compiler uses a set of rules to infer lifetimes when they are not explicitly provided.

  • These rules are based on the context of the function or the surrounding code block.

  • If the compiler cannot infer a lifetime based on the rules, it will raise an error.

In Rust, lifetime elision simplifies reference lifetime annotations in functions and methods.

1. Each Elided Lifetime in Input Position Becomes a Distinct Lifetime Parameter:

  • This rule applies to function arguments where references are involved.

  • When multiple reference arguments are used in a function without explicit lifetime annotations, the compiler infers a distinct lifetime for each argument.

  • This ensures that the function can track the validity of each reference independently.

fn compare(x: &str, y: &str) -> bool {
  x == y
}

In this example, the lifetimes of x and y are elided (not explicitly specified). The compiler infers separate lifetimes ('a and 'b) for each argument, allowing them to potentially have different durations within the function's scope.

2. If There is Exactly One Input Lifetime Position (Elided or Not), That Lifetime is Assigned to All Elided Output Lifetimes:

  • This rule applies to functions that return references and have either a single explicit lifetime or all elided lifetimes in the arguments.

  • If there's a single lifetime specified (or inferred) for the arguments, the compiler assigns the same lifetime to any references returned by the function.

  • This ensures that the returned references are valid for as long as the data they refer to in the function's arguments.

fn first_word<'a>(sentence: &'a str) -> &'a str {
  let space_index = sentence.find(' ').unwrap_or(sentence.len());
  &sentence[..space_index]
}

Here, the sentence argument has an elided lifetime. The function returns a slice (&'a str) of the input string.

The compiler infers that the returned slice's lifetime ('a) should match the lifetime of the sentence argument to ensure the reference points to valid data.

3. If There Are Multiple Input Lifetime Positions, But One of Them is &self or &mut self (for Methods), the Lifetime of self is Assigned to All Elided Output Lifetimes:

  • This rule is specific to methods in structs where the first argument is a reference to the struct itself (&self or &mut self).

  • When other arguments in the method have elided lifetimes, the compiler assigns the lifetime of self to all of them (including the return value if it's a reference).

  • This ensures consistency in the lifetimes of data accessed within the method.

struct Document {
  content: String,
}

impl Document {
  fn get_word_count(&self) -> usize {
    self.content.split_whitespace().count()
  }
}

In this example, the get_word_count method takes &self as an argument (referencing the Document struct itself).

The return value (usize) doesn't have an explicit lifetime. The compiler infers that the return value's lifetime should match the lifetime of self (which depends on how long the Document instance is valid).

Exclusive Access

fn main() {
    // Only one mutable borrow
    let mut x = 5;

    // And many immutable borrows
    let ref_1 = &x; // Immutable reference
    let ref_2 = &x; // Immutable reference

    // This will cause an error
    // let ref_3 = &mut x;

    println!("ref_1: {}", ref_1);
    println!("ref_2: {}", ref_2);
}

// Error
error[E0502]: cannot borrow `x` as mutable because it is also borrowed as immutable
  --> src/main.rs:11:17
   |
7  |     let ref_1 = &x; // Immutable reference
   |                 -- immutable borrow occurs here
...
11 |     let ref_3 = &mut x;
   |                 ^^^^^^ mutable borrow occurs here
12 |
13 |     println!("ref_1: {}", ref_1);
   |                           ----- immutable borrow later used here

How does Rust prevent Data Races and Dangling References?

Data Races

Data races occur when:

  • In multithreaded programs, you can have multiple threads of execution running concurrently. These threads might share access to some data in memory.

  • If two or more threads try to access and modify the same data location simultaneously without any coordination (synchronization).

Problem with Data Races:

The outcome of a data race depends on the unpredictable timing of the threads involved. Commonly, It can lead to:

  • Overwritten Data: One thread might overwrite the changes made by another thread, resulting in corrupted data.

  • Inconsistent State: The program's state might become inconsistent, making it difficult to understand or debug.

  • Crashes: In severe cases, data races can lead to program crashes.

Preventing Data Races in Rust

Rust's ownership system and borrow checker help prevent data races in most cases. By ensuring only one thread has mutable access to a piece of data at a time, Rust avoids the potential for conflicts.

When necessary, you can use synchronization primitives like mutexes or channels to coordinate access between threads and prevent data races.

Dangling References

As we know, Rust uses a unique ownership system to manage memory allocation and deallocation. When a variable goes out of scope, the memory it occupies is automatically freed. This prevents memory leaks, a common issue in languages with manual memory management.

A dangling reference occurs when a reference still exists after the data it points to has been deallocated. This can happen in several ways:

Passing ownership (moving) the data being referenced while the reference still exists.

The referenced data going out of scope before the reference itself.

Problems with Dangling References:

  • Using a dangling reference to access data can lead to undefined behavior.

  • You might try to read or modify memory that no longer exists, causing crashes or unexpected program results.

  • Debugging these errors can be difficult because they might not show up immediately.

Preventing Dangling References in Rust

Rust's ownership system and borrow checker play a crucial role in preventing dangling references.

  • The borrow checker ensures that references are valid for as long as they're needed.

  • Lifetimes are annotations that explicitly specify the lifetime of a reference in relation to the data it refers to.

Last updated