📩Borrowing & References

Memory Management Concepts In Rust

What are Borrowing & References in Rust?

💡 Borrowing is the act of creating a reference. It’s a temporary access to a value without taking ownership of it. It’s achieved through references.

We can imagine borrowing in Rust as we borrow a book from a library, we can use it for a while, but we eventually return it to the owner.

💡 References are pointer with rules and restrictions, and they don’t take ownership.

💡 Unlike the pointer which can point to invalid data in memory locations, references are guaranteed to point to a valid value of a particular type for the life of that reference. Rust’s ownership system and borrow checker significantly reduce the chances of this happening.

Rust uses borrowing to improve performance and in the case where ownership is not needed or desired.

In summary:

Borrowing:

  • The temporary access to a value without taking ownership of it.

  • In Rust, borrowing is achieved through references.

References:

  • A reference (&T) is a variable that points to an existing value of another type (T).

  • It acts like an address that tells your code where to find the data.

  • You can't modify the borrowed data through a reference by default (immutable borrows).

  • There are also mutable references (&mut T) that allow modification of the borrowed data but with restrictions on how they are used to ensure safety.

For example:

fn print_number(x: &i32) { // x is a reference to an i32
  println!("Number: {}", x);
}

fn main() {
  let num = 5;
  print_number(&num); // Pass a reference to the function
}

Ownership and Borrowing

Ownership and borrowing are intertwined. You don't take ownership when you borrow a value, so the original owner can still use the data.

However, there are restrictions on borrowing based on ownership rules. For example, you can't simultaneously have multiple mutable borrows to the same data.

Benefits of Borrowing

Efficiency: Avoids unnecessary copying of data, especially for large structures.

Immutability: Encourages a functional programming style by promoting functions that don't modify their arguments.

Multiple Access: Allows multiple parts of your code to access the same data concurrently (with limitations for mutable borrows).

Borrowing Rules

Borrowing rules in Rust are a set of guidelines that govern how you can access data through references without taking ownership.

Mutability

By default, references are immutable (&T). You can't modify the data they point to. However, in some cases, you can have mutable references (&mut T) that allow modification.

There are restrictions on mutable borrows:

You can only have one mutable borrow to a specific piece of data at a time.

No immutable borrows can exist while a mutable borrow is outstanding.

// Can't borrow `x` as mutable more than once at a time
fn main() {
    let mut x = 5;

    let ref_mut_1 = &mut x; // Mutable reference
    let ref_mut_2 = &mut x; // Mutable reference - this will cause an error

    println!("ref_mut_1: {}", ref_mut_1);
    println!("ref_mut_2: {}", ref_mut_2);
}
// Can't borrow `x` as immutable because it is also borrowed as mutable
fn main() {
    let mut x = 5;

    let ref_mut = &mut x; // Mutable reference
    let ref_immut = &x; // Immutable reference - this will cause an error

    println!("ref_mut: {}", ref_mut);
    println!("ref_immut: {}", ref_immut);
}

The Lifetimes

💡 A reference's lifetime (the duration for which it's valid) must be less than or equal to the lifetime of the data it refers to.

💡 In simpler terms, the borrowed data must outlive the reference itself. This prevents dangling references where you try to access data that has already been deallocated.

Concrete Lifetimes

💡 A concrete lifetime represents the actual lifetime of a specific value in memory. It's the duration for which a reference points to valid data.

💡 Concrete lifetimes are not explicitly written in your code. Instead, they are tied to the specific scope of the reference in your code, therefore the Rust compiler infers them based on the context and the scope of the reference.

In this example, the concrete life is tied to the scope of the main function.

The concrete lifetime of the reference hello is equal to the lifetime of the string value stored in s. Both hello and s will be valid as long as the string exists within the main function’s scope.

fn main() {
    let s = String::from("Hello, world!");
    let hello = &s; // Reference to s
    println!("{hello}");
}

Generic Lifetimes

💡 A generic lifetime is a placeholder for an unknown lifetime. It's used in functions and types that can work with references of different lifetimes.

💡 Generic lifetimes are denoted by apostrophes ('a, 'b, etc.) and are specified in the function or type signature. The compiler then ensures that the actual lifetimes used with the function or type satisfy these generic lifetime bounds.

In this example, the longest function will either return x or y based on the length condition, and the string x & y could have different lifetimes, hence the Rust compiler doesn’t know the lifetimes of the return reference in this case.

To make sure that both references live long enough for the function to return a valid reference, we have to specify the lifetimes annotation which is ‘a to describe the relationship between lifetimes of references. The lifetime 'a is used to indicate that both references (x and y) must live for the same duration (as long as the function itself). This ensures the borrowed data (the string content) is valid throughout the function's execution.

In this case, we’re trying to tell Rust compiler that there’s a relationship between x and y and return value, it means the lifetime of the return value is going to be equal to the shortest lifetime passed in either x or y .

In the main function, we call longest with references to both strings. Since the function promises to return a reference with the same lifetime ('a), the lifetimes of both arguments (string1 and string2) must be at least as long as the function's lifetime ('a). This ensures the borrowed data remains valid until the returned reference goes out of scope.

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {
    if x.len() > y.len() {
        x
    } else {
        y
    }
}

fn main() {
    let string1 = String::from("apple");
    let string2 = String::from("banana");
    let result = longest(&string1, &string2);
    println!("Longest string: {}", result);
}

Lifetime Elision

Do we always need to define the lifetime for every code we write? The answer is NO. After working with Rust for a while, the Rust team found predictable patterns when dealing with lifetime, therefore, they decided to program those patterns into the compiler, hence, the borrow checker could infer the lifetimes for us without explicit annotations. This feature is called Lifetime Elision.

💡 Lifetime elision is a feature in Rust that allows the compiler to automatically infer the lifetimes of references in certain common scenarios. This helps to reduce boilerplate code and improve readability, especially when dealing with functions that involve references.

Lifetimes on function or method parameters are called input lifetimes, and lifetimes on return values are called output lifetimes.

How Lifetime Elision Works:

  • The Rust compiler uses a set of rules to infer lifetimes when they are not explicitly provided.

  • These rules are based on the context of the function or the surrounding code block.

  • If the compiler cannot infer a lifetime based on the rules, it will raise an error.

In Rust, lifetime elision simplifies reference lifetime annotations in functions and methods.

1. Each Elided Lifetime in Input Position Becomes a Distinct Lifetime Parameter:

💡 The first rule is Distinct Parameters: Each reference parameter in a function receives its own distinct lifetime parameter during elision. This ensures independent tracking of their validity within the function's scope.

  • This rule applies to function arguments where references are involved.

  • When multiple reference arguments are used in a function without explicit lifetime annotations, the compiler infers a distinct lifetime for each argument.

  • This ensures that the function can track the validity of each reference independently.

fn compare(x: &str, y: &str) -> bool {
  x == y
}

In this example, the lifetimes of x and y are elided (not explicitly specified). The compiler infers separate lifetimes ('a and 'b) for each argument, allowing them to potentially have different durations within the function's scope.

2. If There is Exactly One Input Lifetime Position (Elided or Not), That Lifetime is Assigned to All Elided Output Lifetimes:

💡 The second rule is Single Input Lifetime: When there's a single explicit or inferred lifetime for all input arguments (references), the same lifetime is assigned to any references returned by the function. This guarantees the returned references point to valid data for as long as the input arguments are valid.

  • This rule applies to functions that return references and have either a single explicit lifetime or all elided lifetimes in the arguments.

  • If there's a single lifetime specified (or inferred) for the arguments, the compiler assigns the same lifetime to any references returned by the function.

  • This ensures that the returned references are valid for as long as the data they refer to in the function's arguments.

fn first_word<'a>(sentence: &'a str) -> &'a str {
  let space_index = sentence.find(' ').unwrap_or(sentence.len());
  &sentence[..space_index]
}

Here, the sentence argument has an elided lifetime. The function returns a slice (&'a str) of the input string.

The compiler infers that the returned slice's lifetime ('a) should match the lifetime of the sentence argument to ensure the reference points to valid data.

3. If There Are Multiple Input Lifetime Positions, But One of Them is &self or &mut self (for Methods), the Lifetime of self is Assigned to All Elided Output Lifetimes:

💡 The third rule is Methods and Self Lifetime: For methods with a reference to the struct itself (&self or &mut self) as the first argument, the lifetime of self is used for all elided lifetimes in the arguments and return value (if it's a reference). This simplifies method syntax and promotes consistency in lifetime management within the method.

  • This rule is specific to methods in structs where the first argument is a reference to the struct itself (&self or &mut self).

  • When other arguments in the method have elided lifetimes, the compiler assigns the lifetime of self to all of them (including the return value if it's a reference).

  • This ensures consistency in the lifetimes of data accessed within the method.

struct Document {
  content: String,
}

impl Document {
  fn get_word_count(&self) -> usize {
    self.content.split_whitespace().count()
  }
}

In this example, the get_word_count method takes &self as an argument (referencing the Document struct itself).

The return value (usize) doesn't have an explicit lifetime. The compiler infers that the return value's lifetime should match the lifetime of self (which depends on how long the Document instance is valid).

Exclusive Access

💡 Only one mutable borrow or any number of immutable borrows can exist for a piece of data at a time. This ensures exclusive access and prevents data races (where multiple threads try to modify the same data concurrently).

fn main() {
    // Only one mutable borrow
    let mut x = 5;

    // And many immutable borrows
    let ref_1 = &x; // Immutable reference
    let ref_2 = &x; // Immutable reference

    // This will cause an error
    // let ref_3 = &mut x;

    println!("ref_1: {}", ref_1);
    println!("ref_2: {}", ref_2);
}

// Error
error[E0502]: cannot borrow `x` as mutable because it is also borrowed as immutable
  --> src/main.rs:11:17
   |
7  |     let ref_1 = &x; // Immutable reference
   |                 -- immutable borrow occurs here
...
11 |     let ref_3 = &mut x;
   |                 ^^^^^^ mutable borrow occurs here
12 |
13 |     println!("ref_1: {}", ref_1);
   |                           ----- immutable borrow later used here

How does Rust prevent Data Races and Dangling References?

Data Races

💡 Data races are a type of concurrency error that can occur in programs that use multiple threads to access the same memory location simultaneously, without proper synchronization. This can lead to unexpected behavior, crashes, and incorrect results.

💡 Data races are a specific type of race condition that occurs when there's a data access conflict.

Data races occur when:

  • In multithreaded programs, you can have multiple threads of execution running concurrently. These threads might share access to some data in memory.

  • If two or more threads try to access and modify the same data location simultaneously without any coordination (synchronization).

Problem with Data Races:

The outcome of a data race depends on the unpredictable timing of the threads involved. Commonly, It can lead to:

  • Overwritten Data: One thread might overwrite the changes made by another thread, resulting in corrupted data.

  • Inconsistent State: The program's state might become inconsistent, making it difficult to understand or debug.

  • Crashes: In severe cases, data races can lead to program crashes.

Preventing Data Races in Rust

Rust's ownership system and borrow checker help prevent data races in most cases. By ensuring only one thread has mutable access to a piece of data at a time, Rust avoids the potential for conflicts.

When necessary, you can use synchronization primitives like mutexes or channels to coordinate access between threads and prevent data races.

Dangling References

💡 Dangling Reference refers to a situation where you have a reference (like a pointer) that points to a memory location that no longer holds the data it originally referred to. This can happen because the memory has been deallocated, and the reference is now invalid.

As we know, Rust uses a unique ownership system to manage memory allocation and deallocation. When a variable goes out of scope, the memory it occupies is automatically freed. This prevents memory leaks, a common issue in languages with manual memory management.

A dangling reference occurs when a reference still exists after the data it points to has been deallocated. This can happen in several ways:

Passing ownership (moving) the data being referenced while the reference still exists.

The referenced data going out of scope before the reference itself.

Problems with Dangling References:

  • Using a dangling reference to access data can lead to undefined behavior.

  • You might try to read or modify memory that no longer exists, causing crashes or unexpected program results.

  • Debugging these errors can be difficult because they might not show up immediately.

Preventing Dangling References in Rust

Rust's ownership system and borrow checker play a crucial role in preventing dangling references.

  • The borrow checker ensures that references are valid for as long as they're needed.

  • Lifetimes are annotations that explicitly specify the lifetime of a reference in relation to the data it refers to.

Last updated