Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Introduction

Lifetimes are one of Rust’s most distinctive feature. They are what makes the language so valuable in the systems programming domain, and it’s essential to master them to use Rust effectively. Unfortunately, they’re not very well explained topic: to have a really solid understanding of how lifetimes work you have to learn about them bit by bit from scattered blogs, rustlang GitHub issues, compiler source-code, and Zulip discussions – so this book is my attempt to cover everything in a single place.

We will start from the Basics chapter where we will explore how lifetime annotations affect the borrow checking, after finishing this chapter you should be able to annotate your code with a confidence and an understanding of what you’re doing. Then we will touch various topics that are nice to know about in certain situations and which are not as important as Basics so you can always revisit them when you need to.

The book was never finished

Originally, I planned to cover a broad list of lifetime-related topics but due to some unfortunate circumstances this project had to be freezed. If you find the information here valuable you can help resurrecting the book with your contributions

  • Fix typos, improve wording, suggest improvements to existing chapters
  • Contribute more examples and exercises
    • right now the lifetime subtyping chapter desperately needs a good intuitive example justifying the use of outlives relationship in function signatures
  • Contribute misc chapters about:
    • Reborrowing and stacked borrows
    • Lifetime ellision rules, different rules for functions and closures, tricks that help to fix incorrectly ellided lifetimes
    • Late and early bounds and the use of for<'a>
    • When the use of lifetimes in trait definitions is justified
    • Polonius(intuitive model) and the future of borrowchk
    • Lifetimes in GATs
    • Binding lifetimes in unsafe code
    • …etc…

Basics

It’s recommended to follow the example by retyping it in your favorite editor

Imagine we’re writing a web crawler that crawls posts from different blogs. The crawler’s output looks like this:

#![allow(unused)]
fn main() {
struct DiscoveredItem {
    blog_url: String,
    post_url: String,
}
}

Now we want to write a function filtering the web crawler’s results to iterate posts from some specific blog.

#![allow(unused)]
fn main() {
struct DiscoveredItem {
    blog_url: String,
    post_url: String,
}


fn posts_from_blog(items: &[DiscoveredItem], blog_url: &str) -> impl Iterator<Item = &str> {
    // Creating an iterator from the &[DiscoveredItem] slice
    items.iter().filter_map(move |item| {
        // Filtering items by blog_url and returning a post_url
        (item.blog_url == blog_url).then_some(item.post_url.as_str())
    })
}
}

Our function doesn’t compile, the compiler complains it needs some lifetime annotations.

error[E0106]: missing lifetime specifier
 --> src/main.rs:6:86
  |
6 | fn posts_from_blog(items: &[DiscoveredItem], blog_url: &str) -> impl Iterator<Item = &str> {
  |                           -----------------            ----                          ^ expected named lifetime parameter
  |
  = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `items`
 or `blog_url`
help: consider introducing a named lifetime parameter
  |
6 | fn posts_from_blog<'a>(items: &'a [DiscoveredItem], blog_url: &'a str) -> impl Iterator<Item = &'a str> {
  |                   ++++         ++                              ++                               ++

For more information about this error, try `rustc --explain E0106`.

If we read the error carefully we can even see the suggestion on how to fix our signature.

Let’s apply it!

#![allow(unused)]
fn main() {
struct DiscoveredItem {
    blog_url: String,
    post_url: String,
}

fn posts_from_blog<'a>(items: &'a [DiscoveredItem], blog_url: &'a str) -> impl Iterator<Item = &'a str> {
    // Creating an iterator from the &[DiscoveredItem] slice
    items.iter().filter_map(move |item| {
        // Filtering items by blog_url and returning a post_url
        (item.blog_url == blog_url).then_some(item.post_url.as_str())
    })
}
}

Cool, we satisifed the borrow checker and now everything compiles and therefore works… right? Well, not quite. The compiler actually tricked us. It provided a suggestion which is semantically incorrect and which made our function overly- restrictive. This means the borrow checker will strike back soon and will make some developer’s day insuffarable. Let’s step into this developer’s shoes.

Borrow checker strikes back

Now we’re trying to use the function in some non-trivial context:

#struct DiscoveredItem {
   blog_url: String,
   post_url: String,
#}

#fn posts_from_blog<'a>(items: &'a [DiscoveredItem], blog_url: &'a str) -> impl Iterator<Item = &'a str> {
   // Creating an iterator from the &[DiscoveredItem] slice
   items.iter().filter_map(move |item| {
       // Filtering items by blog_url and returning a post_url
       (item.blog_url == blog_url).then_some(item.post_url.as_str())
   })
#}
fn main() {
    // Assume the crawler returned the following results
    let crawler_results = &[
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned(),
            post_url: "https://blogs.com/cooking/fried_eggs".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned(),
            post_url: "https://blogs.com/travelling/death_mountain".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://successfulsam.xyz/".to_owned(),
            post_url: "https://successfulsam.xyz/keys_to_success/Just_use_AI_every_day".to_owned(),
        },
    ];

    // Reading the blog URL we're interested in from somewhere
    let blog_url = get_blog_url();

    // Collecting post URLs from this blog using our function
    let post_urls: Vec<&str> = posts_from_blog(crawler_results, &blog_url).collect();

    // Spawning a thread to do some further blog processing
    let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));

    // Processing posts in parallel
    for url in post_urls {
        process_post(url);
    }

    handle.join().expect("Everything will be fine");
}

fn get_blog_url() -> String {
    "https://blogs.com/".to_owned()
}

fn process_post(url: &str) {
    println!("{}", url);
}

// Assume some requests being made to the blog_url to evaluate stats and save them to DB
fn calculate_blog_stats(_blog_url: String) {}

This code doesn’t compile and the compiler error is very confusing:

error[E0505]: cannot move out of `blog_url` because it is borrowed
  --> src/main.rs:41:37
   |
35 |     let blog_url = get_blog_url();
   |         -------- binding `blog_url` declared here
...
38 |     let post_urls: Vec<&str> = posts_from_blog(crawler_results, &blog_url).collect();
   |                                                              --------- borrow of `blog_url` occurs here
...
41 |     let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
   |                                     ^^^^^^^                      -------- move occurs due to use in closure
   |                                     |
   |                                     move out of `blog_url` occurs here
...
44 |     for url in post_urls {
   |                --------- borrow later used here
   |
help: consider cloning the value if the performance cost is acceptable
   |
38 |     let post_urls: Vec<_> = posts_from_blog(crawler_results, &blog_url.clone()).collect();
   |                                                                       ++++++++

For more information about this error, try `rustc --explain E0505`.

How blog_url can stay borrwed in the loop when we collect the iterator immediately and post_urls: Vec<&str> is a collection of references coming directly from crawler_results which are unrelated to get_blog_url() call? When you see such confusing errors it’s often a sign that the problem is not in your code but in one of the function signatures that your code uses. But how to identify and fix malformed function signatures and what the compiler error is actually communicating to us? To answer these questions, we need to understand the way the borrow checker works.

Borrow checking

The most important thing to understand about the borrow checker is it analyzes each function completely independently(in isolation) from other functions. This means when we encounter a call to our posts_from_blog the borrow checker doesn’t look inside it to validate the usage of references, all it does is it reads the function signature and evaluates its lifetimes. But what does it mean to evaluate a lifetime? Let’s go back to our example and figure this out.

Infering regions

The borrow checker analyzes the main function and encounters a line of code with posts_from_blog function invocation.

let post_urls: Vec<_> = posts_from_blog(crawler_results, &blog_url).collect();

First of all, the borrow checker looks at the function signature:

fn posts_from_blog<'a>(
    items: &'a [DiscoveredItem],
    blog_url: &'a str,
) -> impl Iterator<Item = &'a str> {
    // We're looking at the function from the borrow checker's perspective.
    // It doesn't see the impl :(
}

'a is a generic lifetime. It is similar to a generic type T in a sense that it acts like a placeholder the compiler needs to infer and validate statically at every place we call the function. To do that the compiler must adhere the following rules:

  1. An inferred lifetime must be as small as possible.
  2. References with this inferred lifetime must stay valid for the whole lifetime (no dangling pointers!)

Ok, but that still sounds vague. What exactly is a lifetime and what the compiler is actually infering? Well, the lifetime is nothing more than a continuous1 region of code(e.g. from line 3 to line 8). A region can also be just a single line of code:

    // Processing posts in parallel
    for url in post_urls {
/---process_post region. Holds: `&url`
|        process_post(url);
-
    }

In the example above url reference is assigned to the 'process_post region which is inferred to be only 1 line long.

Region boundaries define where items assigned to the region can be accessed and for references - where the borrows of the referents end.

Compiler infers regions for owned values and local references fully automatically, you can reason about these inferences using Rust scoping rules with only exception that references are getting dropped immediately after the last use - not at the end of the scope.

#![allow(unused)]
fn main() {
fn f() {
/----num region lasts till num is moved out of scope or till the end of scope
|    let num = Num(42);
|/---ref_mut region ends right after the `.add` call(last use rule)
||   let ref_mut = &mut num;
||   ref_mut.add(1);
|-
|/---assert_eq region holds temp &num for the duration of assert call
||   assert_eq!(&Num(43), &num)
|-
-
}
}

However, when encountering another function call within the analyzed function the compiler doesn’t do any smart automatic inferences and safety checks, it doesn’t study the another function body, it doesn’t track function inputs and function outputs, it simply relies on generic parameters you defined in the function signature to create regions for a particular call.

To study this, let’s return to our main example and read the posts_from_blog signature through regions lenses:

fn posts_from_blog<'a>(
    items: &'a [DiscoveredItem],
    blog_url: &'a str,
) -> impl Iterator<Item = &'a str>;

This signature reads as: “Dear Rust compiler, please infer a single region 'a at the call site and assign to it the following references: items, blog_url, impl Iterator, Iterator::Item.”

Notice that the impl Iterator belongs to the region 'a too, this is because the complete return type actually looks like this: impl Iterator<Item = &'a str> + 'a2 but the compiler elided the last 'a according to lifetime elision rules.

Now at the call site we need to infer a region 'a for the posts_from_blog such that the region is as small as possible for all 4 references it holds. How wide the region should be? As wide as all references it holds must stay valid. How long the references must stay valid? As long as they’re used. So, the size of a region is basically determined by the last used reference belonging to the region.

Let’s apply this rule in practice. Our function returns an iterator impl Iterator<_> + 'a. How is it used?

let post_urls: Vec<_> = posts_from_blog(crawler_results, &blog_url).collect();

Well, we just collect it immediately into a vector therefore, our region with respect of iterator is a single line of code(the iterator is consumed and can’t be used anywhere else):

fn main() {
    let crawler_results = &[
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned().to_owned(),
            post_url: "https://blogs.com/cooking/fried_eggs".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned(),
            post_url: "https://blogs.com/travelling/death_mountain".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://successfulsam.xyz/".to_owned(),
            post_url: "https://successfulsam.xyz/keys_to_success/Just_do_this_one_thing_every_day".to_owned(),
        },
    ];

    let blog_url = get_blog_url();

/---posts_from_blog 'a region
|   let post_urls: Vec<_> = posts_from_blog(crawler_results, &blog_url).collect();
-
    let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));

    for url in post_urls {
        process_post(url);
    }

    handle.join().expect("Everything will be fine");
}

The consumed iterator yields references Item=&'a str that also belong to our region. We store them in the post_urls vector. Now we need to find the last usage of those references. It’s here:

    for url in post_urls {
        process_post(url);
    }

So post_urls references must be valid at least till the end of this loop. Expanding the region accordingly:

fn main() {
    let crawler_results = &[
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned().to_owned(),
            post_url: "https://blogs.com/cooking/fried_eggs".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned(),
            post_url: "https://blogs.com/travelling/death_mountain".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://successfulsam.xyz/".to_owned(),
            post_url: "https://successfulsam.xyz/keys_to_success/Just_do_this_one_thing_every_day".to_owned(),
        },
    ];

    let blog_url = get_blog_url();

/---posts_from_blog 'a region
|   let post_urls: Vec<_> = posts_from_blog(crawler_results, &blog_url).collect();
|
|   let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
|
|   for url in post_urls {
|       process_post(url);
|   }
-
    handle.join().expect("Everything will be fine");
}

As for input arguments, usually they don’t affect the region expansion because they must be valid only for the duration of a function call, but later we will study some cases when they do. Let’s evaluate items: &'a [DiscoveredItem] and blog_url: &'a str together. They’re just regular input references without any quirks, so they must be valid only at the line with the function invocation. If we had started our analysis from input arguments, our region would look like this:

fn main() {
    let crawler_results = &[
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned().to_owned(),
            post_url: "https://blogs.com/cooking/fried_eggs".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned(),
            post_url: "https://blogs.com/travelling/death_mountain".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://successfulsam.xyz/".to_owned(),
            post_url: "https://successfulsam.xyz/keys_to_success/Just_do_this_one_thing_every_day".to_owned(),
        },
    ];

    let blog_url = get_blog_url();

/---posts_from_blog 'a region
|   let post_urls: Vec<_> = posts_from_blog(crawler_results, &blog_url).collect();
-
    let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));

    for url in post_urls {
        process_post(url);
    }

    handle.join().expect("Everything will be fine");
}

But we’ve already analyzed the outputs and know that our region must be wider, hence the resulting 'a region of the posts_from_blog function looks like this:

fn main() {
    let crawler_results = &[
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned().to_owned(),
            post_url: "https://blogs.com/cooking/fried_eggs".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned(),
            post_url: "https://blogs.com/travelling/death_mountain".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://successfulsam.xyz/".to_owned(),
            post_url: "https://successfulsam.xyz/keys_to_success/Just_do_this_one_thing_every_day".to_owned(),
        },
    ];

    let blog_url = get_blog_url();

/---posts_from_blog 'a region
|   let post_urls: Vec<_> = posts_from_blog(crawler_results, &blog_url).collect();
|
|   let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
|
|   for url in post_urls {
|       process_post(url);
|   }
-
    handle.join().expect("Everything will be fine");
}

The region holds: a copy of the crawler_results reference, a reference to blog_url, a vector of post_urls references, and the consumed iterator(yes, it’s consumed at the first line of the region but accessing the iterator must still be valid for the whole region scope). Note that we didn’t analyze any relationships between references. At this point we don’t understand how inputs and outputs are connected and where the references point to. All we did is we inferred a region for them within which accessing any of those references must be possible.

That’s it for the function inferences, the inference for structs containing lifetimes works basically the same with the only addition that the usage of the struct itself expands all regions defined in struct, e.g.:

struct RefPair<'a> {
    a: &'a str,
    b: &'a str,
}

impl<'a> RefPair<'a> {
    fn concat(&self) -> String {
        format!("{}{}", self.a, self.b)
    }
}

fn main() {
     let x = String::from("X");
     let y = String::from("Y");
/----RefPair 'a region
|    let pair = RefPair {
|        a: x.as_str(),
|        b: y.as_str(),
|    }
|
|    println!("{}", pair.concat());
-
     println!("values: {x}, {y}");
}

And now if we move the last usage of the struct further down the main function the RefPair 'a region gets expanded:

fn main() {
     let x = String::from("X");
     let y = String::from("Y");
/----RefPair 'a region
|    let pair = RefPair {
|        a: x.as_str(),
|        b: y.as_str(),
|    }
|
|
|    println!("values: {x}, {y}");
|
|    println!("{}", pair.concat());
-
}

Validating regions

It’s time to ensure the safety. After we inferred all regions in the analyzed function we need to explore relationships between the regions (not variables) looking for potential conflicts. Let’s start at the point where the crawler_results reference is copied to be passed as an argument to the posts_from_blog function. Can we create this copy?

fn main() {
/---crawler results region
|   let crawler_results = &[
|       DiscoveredItem {
|           blog_url: "https://blogs.com/".to_owned().to_owned(),
|           post_url: "https://blogs.com/cooking/fried_eggs".to_owned(),
|       },
|       DiscoveredItem {
|           blog_url: "https://blogs.com/".to_owned(),
|           post_url: "https://blogs.com/travelling/death_mountain".to_owned(),
|       },
|       DiscoveredItem {
|           blog_url: "https://successfulsam.xyz/".to_owned(),
|           post_url: "https://successfulsam.xyz/keys_to_success/Just_do_this_one_thing_every_day".to_owned(),
|       },
|   ];
|
|   let blog_url = get_blog_url();
|
|/--posts_from_blog 'a region
||  let post_urls: Vec<_> = posts_from_blog(crawler_results, &blog_url).collect();
||
||  let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
||
||  for url in post_urls {
||      process_post(url);
||  }
|-
|   handle.join().expect("Everything will be fine");
-
}

crawler_resutls region fills the whole main body clearly outliving our posts_from_blog 'a region meaning we can dereference the copy of crawler_results reference at any line within the posts_from_blog 'a region (note that we use crawler_results borrow only at the line with the function call, but it lasts till the end of the posts_from_blog 'a region anyway).

Then we’re taking a reference to blog_url. Can we do that?

fn main() {
    let crawler_results = &[
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned().to_owned(),
            post_url: "https://blogs.com/cooking/fried_eggs".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned(),
            post_url: "https://blogs.com/travelling/death_mountain".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://successfulsam.xyz/".to_owned(),
            post_url: "https://successfulsam.xyz/keys_to_success/Just_do_this_one_thing_every_day".to_owned(),
        },
    ];
/---blog_url region
|   let blog_url = get_blog_url();
|
|/--posts_from_blog 'a region
||  let post_urls: Vec<_> = posts_from_blog(crawler_results, &blog_url).collect();
||
||  let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
-|
 |  for url in post_urls {
 |      process_post(url);
 |  }
 -
    handle.join().expect("Everything will be fine");

}

No, we can’t. We must be able to derefence this blog_url reference at any line within the posts_from_blog 'a region, but there is no way of doing that around the for loop because blog_url region ends(variable moves out of scope) right before the loop.

Now we should be able to decipher the error message from the previous chapter:

error[E0505]: cannot move out of `blog_url` because it is borrowed
  --> src/main.rs:41:37
   |
35 |     let blog_url = get_blog_url();
   |         -------- binding `blog_url` declared here
...
38 |     let post_urls: Vec<&str> = posts_from_blog(crawler_results, &blog_url).collect();
   |                                                              --------- borrow of `blog_url` occurs here
...
41 |     let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
   |                                     ^^^^^^^                      -------- move occurs due to use in closure
   |                                     |
   |                                     move out of `blog_url` occurs here
...
44 |     for url in post_urls {
   |                --------- borrow later used here
   |
help: consider cloning the value if the performance cost is acceptable
   |
38 |     let post_urls: Vec<_> = posts_from_blog(crawler_results, &blog_url.clone()).collect();
   |                                                                       ++++++++

For more information about this error, try `rustc --explain E0505`.

Effectively it tells us that at line 38 compiler tries to create a reference to blog_url in a region that ends at line 44, but blog_url region ends at line 41, so the compiler can’t do that. How can we fix this error? One way is to put the for loop before std::thread::spawn. This way the regions will be aligned and blog_url will be safe to use at any line of the posts_from_blog 'a region. But our code is not executing in parallel this way.

fn main() {
    let crawler_results = &[
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned().to_owned(),
            post_url: "https://blogs.com/cooking/fried_eggs".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned(),
            post_url: "https://blogs.com/travelling/death_mountain".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://successfulsam.xyz/".to_owned(),
            post_url: "https://successfulsam.xyz/keys_to_success/Just_do_this_one_thing_every_day".to_owned(),
        },
    ];
/---blog_url region
|   let blog_url = get_blog_url();
|
|/--posts_from_blog 'a region
||  let post_urls: Vec<_> = posts_from_blog(crawler_results, &blog_url).collect();
||
||  for url in post_urls {
||      process_post(url);
||  }
|-
|   let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
-
    handle.join().expect("Everything will be fine");
}

Another way is to get away with clones. But let’s look at our regions closely:

/---blog_url region
|   let blog_url = get_blog_url();
|
|/--posts_from_blog 'a region
||  let post_urls: Vec<_> = posts_from_blog(crawler_results, &blog_url).collect();
||
||  let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
-|
 |  for url in post_urls {
 |      process_post(url);
 |  }
 -

We don’t really need the blog_url reference to be valid inside the for loop, we only care about post_urls there. This posts_from_blog 'a region is essentially a region for our post_urls, the blog_url region could be much smaller, but the function signature asks the compiler to infer only a single region, so the blog_url reference ends up coupled with the post_urls references. What we actually want is to ask the compiler to infer 2 regions for this function: the one for post_urls and the one for blog_url, so regions in main would look like this

/---blog_url region
|   let blog_url = get_blog_url();
|
|/--posts_from_blog 'post_urls region
||/-posts_from_blog 'blog_url region
||| let post_urls: Vec<_> = posts_from_blog(crawler_results, &blog_url).collect();
||-
||  let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
-|
 |  for url in post_urls {
 |      process_post(url);
 |  }
 -

This way posts_from_blog 'blog_url region is only 1-line long and borrowing blog_url for this line is fine, when posts_from_blog 'post_urls region holds only post_urls references and doesn’t care about the blog_url region at all. Let’s try to split this 'a region!

Splitting posts_from_blog 'a region

To ask the compiler to infer 2 regions instead of 1 we just need to introduce a second lifetime parameter in the function signature:

fn posts_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> impl Iterator<Item = &'post_urls str> {
    // ...
}

We’re taking post urls from input items, so items clearly belong to post_urls region. We use blog_url only for filtering, so it belongs to its own blog_url region. Iterator returns post urls from the input items, so Item = &str must belong to post_urls region. But what about an Iterator itself? We’re iterating items, so let’s assign it to post_urls region.

fn posts_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> impl Iterator<Item = &'post_urls str> + 'post_urls {
    // ...
}

Now we will infer new regions for the updated signature, but in a bit different context to emphasize that borrow checker analyzes each function completely independetly:

fn uaf(options: &CrawlerOptions) {
    let items = crawler::run(options);

    let blog_url = get_blog_url();
    let iterator = posts_from_blog(&items, &blog_url);
    drop(blog_url);

    for url in iterator {
        do_stuff(url);
    }
}

Let’s infer regions in the uaf function:

fn uaf(options: &CrawlerOptions) {
/---items region
|   let items = crawler::run(options);
|
|   let blog_url = get_blog_url();
|/--posts_from_blog 'post_urls region
||  let iterator = posts_from_blog(&items, &blog_url);
||  drop(blog_url);
||
||  for url in iterator {
||      do_stuff(url);
||  }
--
}

posts_from_blog 'post_urls holds an iterator, and a reference to the items variable and it can dereference them at any line of this region because items region outlives the 'post_urls region.

fn uaf(options: &CrawlerOptions) {
    let items = crawler::run(options);
/---blog_url region
|   let blog_url = get_blog_url();
|/--posts_from_blog 'blog_url region
||  let iterator = posts_from_blog(&items, &blog_url);
|-  drop(blog_url);
-
    for url in iterator {
        do_stuff(url);
    }

}

posts_from_blog 'blog_url region holds just a reference to the blog_url variable. It’s an input argument, therefore the reference should be valid only for the time of the function call, so the region is 1-line long. blog_url region clearly outlives this 1-line region, so it’s safe to create a borrow there. As the result the uaf function passes the borrow checking just perfectly, but if we think about what’s going on here we quickly realize that the iterator holds a reference to blog_url internally to do the comparisons, so in fact we have a use after free memory bug here. posts_from_blog function signature doesn’t tell anything about this internal borrow, so the borrow checker can’t spot any issues while analyzing the uaf function. Luckily for us it can spot the issue during the analysis of the posts_from_blog function body which is done only once and completely independently from the uaf.

#![allow(unused)]
fn main() {
struct DiscoveredItem {
    blog_url: String,
    post_url: String,
}
fn posts_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> impl Iterator<Item = &'post_urls str> + 'post_urls {
    items.iter().filter_map(move |item| {
        if item.blog_url == blog_url {
            Some(item.post_url.as_str())
        } else {
            None
        }
    })
}
}

The borrow checker emits the following error for this implementation:

error[E0623]: lifetime mismatch
  --> src/main.rs:11:6
   |
10 |     blog_url: &'blog_url str,
   |               -------------- this parameter and the return type are declared with different lifetimes...
11 | ) -> impl Iterator<Item = &'post_urls str> + 'post_urls {
   |      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |      |
   |      ...but data from `blog_url` is returned here

For more information about this error, try `rustc --explain E0623`.
error: could not compile `playground` due to previous error

It was able to spot that the iterator borrows blog_url from the 'blog_url region, but the signature suggests that the iterator borrows only from the 'post_urls region, so the borrow checker threw a lifetime mismatch error. In order to fix it we must reflect this blog_url borrow in our signature by assigning the iterator to the 'blog_url region.

#![allow(unused)]
fn main() {
struct DiscoveredItem {
    blog_url: String,
    post_url: String,
}
fn posts_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> impl Iterator<Item = &'post_urls str> + 'blog_url {
    items.iter().filter_map(move |item| {
        if item.blog_url == blog_url {
            Some(item.post_url.as_str())
        } else {
            None
        }
    })
}
}
   Compiling playground v0.0.1 (/playground)
error[E0623]: lifetime mismatch
  --> src/main.rs:11:6
   |
10 |     blog_url: &'blog_url str,
   |               -------------- this parameter and the return type are declared with different lifetimes...
11 | ) -> impl Iterator<Item = &'post_urls str> + 'blog_url {
   |      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |      |
   |      ...but data from `items` is returned here

For more information about this error, try `rustc --explain E0623`.
error: could not compile `playground` due to previous error

But compilation fails with the same error. Hmm… It’s time to resort to magic!

#![allow(unused)]
fn main() {
struct DiscoveredItem {
   blog_url: String,
   post_url: String,
}
fn posts_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> impl Iterator<Item = &'post_urls str> + 'blog_url
where
    'post_urls: 'blog_url
{
    items.iter().filter_map(move |item| {
        if item.blog_url == blog_url {
            Some(item.post_url.as_str())
        } else {
            None
        }
    })
}
}

Warning

Since Rust 2024 this resort to magic is not a correct approach anymore, with the new edition we can easily express precisely what we want:

#![allow(unused)]
fn main() {
struct DiscoveredItem {
    blog_url: String,
    post_url: String,
}
fn posts_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> impl Iterator<Item = &'post_urls str> + use<'post_urls, 'blog_url>
{
    items.iter().filter_map(move |item| {
        if item.blog_url == blog_url {
            Some(item.post_url.as_str())
        } else {
            None
        }
    })
}
}

However, this served as a nice introduction point to the lifetime subtyping and the next chapter is built on further exploring this code snippet so it will stay here until the next chapter is rewritten with better examples.

And now everything compiles, including the example from the previous chapter. Check it out:

struct DiscoveredItem {
    blog_url: String,
    post_url: String,
}
fn posts_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> impl Iterator<Item = &'post_urls str> + 'blog_url
where
    'post_urls: 'blog_url
{
    items.iter().filter_map(move |item| {
        if item.blog_url == blog_url {
            Some(item.post_url.as_str())
        } else {
            None
        }
    })
}
fn main() {
    // Assume the crawler returned the following results
    let crawler_results = &[
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned().to_owned(),
            post_url: "https://blogs.com/cooking/fried_eggs".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://blogs.com/".to_owned(),
            post_url: "https://blogs.com/travelling/death_mountain".to_owned(),
        },
        DiscoveredItem {
            blog_url: "https://successfulsam.xyz/".to_owned(),
            post_url: "https://successfulsam.xyz/keys_to_success/Just_do_this_one_thing_every_day".to_owned(),
        },
    ];

    // Reading the blog URL we're interested in from somewhere
    let blog_url = get_blog_url();

    // Collecting post URLs from this blog using our function
    let post_urls: Vec<_> = posts_from_blog(crawler_results, &blog_url).collect();

    // Spawning a thread to do some further blog processing
    let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));

    // Processing posts in parallel
    for url in post_urls {
        process_post(url);
    }

    handle.join().expect("Everything will be fine");
}

// Returns a predefined value
fn get_blog_url() -> String {
    "https://blogs.com/".to_owned()
}

// Just prints URL out
fn process_post(url: &str) {
    println!("{}", url);
}

// Actually does nothing
fn calculate_blog_stats(_blog_url: String) {}

We will demistify the added where clause and will understand the last compilation error in the next chapter, but before going further make sure you understood the material from this chapter.

Chapter exercises

Exercise 1

The chapter says when we encounter a function call we need to infer minimal regions for it at the invocation point. Why do we want these regions to be minimal?

Exercise 2

Assume this signature compiles:

fn posts_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> impl Iterator<Item = &'post_urls str> + 'blog_url {
    // ...
}

Go back to the uaf example. Infer and validate regions for the uaf using this posts_from_blog signature. Does uaf compile? Try to come up with an example that causes use after free again.

Exercise 3

  • Manually infer regions for the following function and explain why this code doesn’t compile

struct RefMutPair<'a, 'b> {
    a: &'a mut String,
    b: &'b mut String,
}

impl<'a, 'b> RefMutPair<'a, 'b> {
    fn concat(&self) -> String {
        format!("{}{}", self.a, self.b)
    }
}

fn main() {
    let mut x = String::from("X");
    let mut y = String::from("Y");

    let pair = RefMutPair {
        a: &mut x,
        b: &mut y,
    };

    println!("{}", pair.concat());

    println!("{}", x);
    pair.b.push('c');
    println!("{}", y);
}
  • The same results can be achieved with the simplified struct definition, can you write it down?

  • Now replace the struct above with the following:

#![allow(unused)]
fn main() {
struct RefMutPair<'a, 'b> {
   a: &'a String,
   b: &'b mut String,
}
}

Try to explain why the example suddenly compiles.

Exercise 4

The new + use<_generics_> syntax effectively allows to define generics for the existential types so the correct modern solution to this chapter’s problem


fn posts_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> impl Iterator<Item = &'post_urls str> + use<'post_urls, 'blog_url> {
    items.iter().filter_map(move |item| {
        if item.blog_url == blog_url {
            Some(item.post_url.as_str())
        } else {
            None
        }
    })
}

turns out to be quite similar to the RefPair example:

struct InferredIteratorType<'post_urls, 'blog_url> {
    post_urls: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
}

impl<'post_urls, 'blog_url> Iterator for InferredIteratorType<'post_urls, 'blog_url> {
    type Item = &'a post_urls str;
    //...
}

fn posts_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> InferredIteratorType<'post_urls, 'blog_url>
    items.iter().filter_map(move |item| {
        if item.blog_url == blog_url {
            Some(item.post_url.as_str())
        } else {
            None
        }
    })
}

Can you explain why unlike in RefPair example it is absolutely required to pass 2 distinct lifetimes(infer 2 regions) for the main example to work?


  1. The book was written in pre-Polonius era.

  2. Since edition 2024 the correct Iterator signature is impl Iterator<Item = &'a str>> + use<'a>.

Lifetime subtyping

'a: 'b reads as 'a is a subtype of 'b, but mixing types with lifetimes is often confusing, so rusteceans prefer to say: lifetime 'a outlives lifetime 'b. This outlives relationship implies 2 important things:

  • It allows to implicitly cast references with 'a lifetime into references with 'b lifetime.
  • The compiler must assert that 'a >= 'b (region 'a is the same or wider than region 'b)

I will call these implicit casts lifetimes shortenings, and denote them as: 'a ~> 'b. Let’s go through some examples to get used to these concepts.

Let’s start from this pseudocode:

given 'a: 'b

ref_a: 'a
ref_b: 'b

ref_b = ref_a // fine, 'a ~> 'b
ref_a = ref_b // not fine, requires 'b: 'a

We’re given 'a: 'b and we have 2 references: ref_a belonging to the region 'a and ref_b belonging to the region 'b. 'a: 'b implies 'a ~> 'b allowing us to assign ref_b to ref_a. By doing that we’re forgetting a reference in the longer region 'a and recreating it in the shorter region 'b. However, assigning ref_a to ref_b results in a compile error. It requires a 'b: 'a relationship to cast 'b ref into 'a ref, but we only have a 'a: 'b relationship.

In the previous chapter we used a visual approach to show how the borrow checker infers regions. In reality it doesn’t work like that. All it does is it assumes a new region for every line of code, infers outlives relationships for those regions, and then executes validations based on this information. When the borrow checker encounters a function call it doesn’t try to be smart and infer anything, it just reads regions and relationships between them directly from the function signature and assigns references to those regions. It means when we’re annotating our signatures with lifetimes we’re doing the great part of the borrow checker’s work ourselves. To get a brief feel of how the borrow checker actually operates we’ll go through the next example written in real Rust. To make it readable I won’t assume a new region for every line of code, but I’ll assume it for every scope:

#![allow(unused)]
fn main() {
{ // 'a
    let a = 42;
    let ref_a = &a; // ref_a belongs to 'a
    { // 'b. 'b is subscope of a', so `'a: 'b`
        let b = 24;
        let mut ref_b = &b; // ref_b belongs to 'b
        ref_b = ref_a; // 'a: 'b => 'a ~> 'b
        println!("{}", ref_b); // prints 42
    }

    println!("{}", ref_a); // prints 42
}
}

The example compiles just fine. It corresponds to the next lines of the pseudocode we met before and works for the same reasons. The only difference is 'a: 'b relationship is not given, but inferred from the function scopes:

inferred 'a: 'b

ref_a: 'a
ref_b: 'b

ref_b = ref_a // fine, 'a ~> 'b

Now let’s try this variation.

#![allow(unused)]
fn main() {
{ // 'a
    let a = 42;
    let mut ref_a = &a; // ref_a belongs to 'a
    { // 'b. 'b is subscope of a', so `'a: 'b`
        let b = 24;
        let ref_b = &b; // ref_b belongs to 'b
        ref_a = ref_b; // compilation error. No `'b: 'a` relationship
        println!("{}", ref_b); // doesn't compile
    }

    println!("{}", ref_a); // doesn't compile
}
}

This code corresponds to these lines of the pseudocode above and doesn’t compile:

inferred 'a: 'b

ref_a: 'a
ref_b: 'b

ref_a = ref_b // not fine, requires 'b: 'a

We can’t assign ref_a to ref_b because we didn’t infer 'b: 'a relationship (inferring it would be wrong because 'b region is shorter than 'a region). We inferred only 'a: 'b, so knowing that and by further inferring the region boundaries within the function scope compiler was able to produce a user friendly b doesn't live long enough error.

At this point, it should be clear why 'a: 'b relationship is required to be able to implicitly cast 'a references into 'b references. In short, we just can’t guarantee safety after the cast if 'a >= 'b condition is not met. If you still feel uncertain you want to study the last 2 examples more carefully.

Specifying lifetime relationships in signatures

Returning to our post_urls_from_blog example we had this error

#![allow(unused)]
fn main() {
struct DiscoveredItem {
    blog_url: String,
    post_url: String,
}
fn post_urls_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> impl Iterator<Item = &'post_urls str> + 'blog_url {
    items.iter().filter_map(move |item| {
        if item.blog_url == blog_url {
            Some(item.post_url.as_str())
        } else {
            None
        }
    })
}
}
   Compiling playground v0.0.1 (/playground)
error[E0623]: lifetime mismatch
  --> src/main.rs:11:6
   |
10 |     blog_url: &'blog_url str,
   |               -------------- this parameter and the return type are declared with different lifetimes...
11 | ) -> impl Iterator<Item = &'post_urls str> + 'blog_url {
   |      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |      |
   |      ...but data from `items` is returned here

For more information about this error, try `rustc --explain E0623`.
error: could not compile `playground` due to previous error

The error is a bit tricky because the root cause lies at this particular dot:

#![allow(unused)]
fn main() {
struct DiscoveredItem {
    blog_url: String,
    post_url: String,
}
fn post_urls_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> impl Iterator<Item = &'post_urls str> + 'blog_url {
    items.iter().filter_map(move |item| {
// here---------^
        if item.blog_url == blog_url {
            Some(item.post_url.as_str())
        } else {
            None
        }
    })
}
}

Let’s examine what happens in between of the iter() and filter_map() calls. iter() returns an Iterator from items and this iterator belongs to 'post_urls region. filter_map() takes the items iterator, but also captures the blog_url from the 'blog_url region in the closure, and we expect the resulting iterator to belong to the ’blog_url region. We can represent what’s happening with the following function:

#![allow(unused)]
fn main() {
fn dot<'post_urls, 'blog_url>(
    input: impl Iterator<Item = ()> + 'post_urls,
) -> impl Iterator<Item = ()> + 'blog_url
{
    input
}
}

The function doesn’t compile. The cast from Iterator + 'post_urls into Iterator + 'blog_url is prohibited because 'post_urls and 'blog_url lifetimes are unrelated. In order to make the cast possible we need to introduce a relationship between the regions. We want to be able to cast(shorten) 'post_urls references into 'blog_url references therefore we need a 'post_urls: 'blog_url relationship. Let’s type it out.

#![allow(unused)]
fn main() {
fn dot<'post_urls, 'blog_url>(
    iter: impl Iterator<Item = ()> + 'post_urls,
) -> impl Iterator<Item = ()> + 'blog_url
where
    'post_urls: 'blog_url
{
    iter
}
}

Now, with this additional bit of information the funciton does compile.

The relationships between lifetimes aren’t inferred between the function calls we need to specify them manually in order to apply casts we want in the function body. Adding where 'post_urls: 'blog_url to dot makes input cast into Iterator + 'blog_url valid. Adding this where clause to our post_urls_from_blog function makes it compile for the same reason.

#![allow(unused)]
fn main() {
struct DiscoveredItem {
    blog_url: String,
    post_url: String,
}
fn post_urls_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> impl Iterator<Item = &'post_urls str> + 'blog_url
where
    'post_urls: 'blog_url
{

    items.iter().filter_map(move |item| {
        if item.blog_url == blog_url {
            Some(item.post_url.as_str())
        } else {
            None
        }
    })
}
}

But what if we had used 'blog_url: 'post_urls relationship instead?

#![allow(unused)]
fn main() {
struct DiscoveredItem {
    blog_url: String,
    post_url: String,
}
fn post_urls_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> impl Iterator<Item = &'post_urls str> + 'post_urls
where
    'blog_url: 'post_urls
{

    items.iter().filter_map(move |item| {
        if item.blog_url == blog_url {
            Some(item.post_url.as_str())
        } else {
            None
        }
    })
}
}

Now instead of casting items.iter() which belongs to 'post_urls we’re casting the borrow of the blog_url in the filter_map closure 'blog_url ~> 'post_urls, so the resulting iterator appears to be Iterator + 'post_urls as shown in the updated function signature and this signature compiles too. What’s the difference? To understand why this is not what we want we need to remember the second implication of 'a: 'b relationship:

  • The compiler must assert that 'a >= 'b (region 'a is the same or wider than region 'b)

Let’s return to the caller site and infer the regions for this signature. We will continue to use the visual approach introduced in the previous chapter because even if it’s not what compiler actually does it works quite well for humans. Ok, so we need to infer 2 regions by the last usage rule and we actually already did that in the previous chapter:

/---blog_url region
|   let blog_url = get_blog_url();
|
|/--post_urls_from_blog 'post_urls region
||/-post_urls_from_blog 'blog_url region
||| let post_urls: Vec<_> = post_urls_from_blog(crawler_results, &blog_url).collect();
||-
||  let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
-|
 |  for url in post_urls {
 |      process_post(url);
 |  }
 -

But now we have an extra precondition in our post_urls_from_blog function signature that 'blog_url must be as wide as 'post_urls region or wider, so we need to extend post_urls_from_blog 'blog_url region to meet this requirement.

/---blog_url region
|   let blog_url = get_blog_url();
|
|/--post_urls_from_blog 'post_urls region
||/-post_urls_from_blog 'blog_url region
||| let post_urls: Vec<_> = post_urls_from_blog(crawler_results, &blog_url).collect();
|||
||| let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
-||
 || for url in post_urls {
 ||     process_post(url);
 || }
 --

As the result post_urls_from_blog 'blog_url and blog_url regions are not aligned and we have a conflict and the same compiler error we were struggling with from the beginning. We know that the region for the iterator must be shorter because, usually, iterators live less then the items they yield, but we failed to communicate this to the compiler and our signature requires the region for the Iterator to be as wide or wider than the region for its items which is wrong, so we must stick with the 'post_urls: 'blog_url relationship. The regions for it will look as we want:

/---blog_url region
|   let blog_url = get_blog_url();
|
|/--post_urls_from_blog 'post_urls region
||/-post_urls_from_blog 'blog_url region
||| let post_urls: Vec<_> = post_urls_from_blog(crawler_results, &blog_url).collect();
||-
||  let handle = std::thread::spawn(move || calculate_blog_stats(blog_url));
-|
 |  for url in post_urls {
 |      process_post(url);
 |  }
 -

post_urls_from_blog 'post_urls > post_urls_from_blog 'blog_url, so no extra region expansion is required and everything compiles just fine.

Hope, this example was sufficient to show that lifetime subtyping is very straighforward to work with. The important thing to remember is when you define a signature you manually specify how many regions the compiler needs to infer and what relationships are between them. The relationships come from the “lifetime casts” you want to perform in your function body, and specifying them results in possible extra region expansions on the caller site, so you need to think ahead which regions can be shorter than others. If regions should be the same replace them with a single region. However, there is one more important thing to consider. To fully grasp lifetime mechanics we need to learn about lifetime variance.

Chapter exercises

Analyze and write down the equivalent to the following signature:

#![allow(unused)]
fn main() {
struct DiscoveredItem {
    blog_url: String,
    post_url: String,
}
fn post_urls_from_blog<'post_urls, 'blog_url>(
    items: &'post_urls [DiscoveredItem],
    blog_url: &'blog_url str,
) -> impl Iterator<Item = &'post_urls str> + 'blog_url
where
    'post_urls: 'blog_url,
    'blog_url: 'post_urls
{

    items.iter().filter_map(move |item| {
        if item.blog_url == blog_url {
            Some(item.post_url.as_str())
        } else {
            None
        }
    })
}
}

Introduction to variance

Suppose we have the following generic structure:

#![allow(unused)]
fn main() {
struct S<T: ?Sized>(T);
}

Let’s define a function with a lifetime cast:

#![allow(unused)]
fn main() {
#struct S<T: ?Sized>(T);
fn shortener<'a: 'b, 'b>(s: S<&'a str>) -> S<&'b str> {
    s
}
}

As we have learnt from the previous chapter the function compiles because there is a 'a: 'b bound, so 'a ~> 'b is allowed. We can simplify the example a bit. In Rust there is an implicit rule for the 'static lifetime that forall 'a . 'static: 'a, so the following signature compiles too.

#![allow(unused)]
fn main() {
#struct S<T>(T);
// implicit 'static: 'a
fn shortener<'a>(s: S<&'static str>) -> S<&'a str> {
    s
}
}

Now let’s replace our generic struct with a Cell struct from the core library which is defined approximately the same way:

#![allow(unused)]
fn main() {
struct Cell<T: ?Sized> {
    value: UnsafeCell<T>
}

struct UnsafeCell<T: ?Sized> {
    value: T
}
}

And we have a compilation error:

#![allow(unused)]
fn main() {
use std::cell::Cell;
fn shortener<'a>(cell: Cell<&'static str>) -> Cell<&'a str> {
    cell
}
}
error[E0308]: mismatched types
 --> src/main.rs:6:5
  |
6 |     cell
  |     ^^^^ lifetime mismatch
  |
  = note: expected struct `Cell<&'a str>`
             found struct `Cell<&'static str>`
note: the lifetime `'a` as defined here...
 --> src/main.rs:5:14
  |
5 | fn shortener<'a>(cell: Cell<&'static str>) -> Cell<&'a str> {
  |              ^^
  = note: ...does not necessarily outlive the static lifetime

For more information about this error, try `rustc --explain E0308`.
error: could not compile `playground` due to previous error

We get the same compilation error if we omit specifying the 'a: 'b relationship in our original example:

#![allow(unused)]
fn main() {
#struct S<T: ?Sized>(T);
fn shortener<'a, 'b>(s: S<&'a str>) -> S<&'b str> {
    s
}
}

So it looks like Cell is somewhat exceptional and 'a: 'b doesn’t work for it for some reason. Let’s figure out why we want to have exceptions from the rules we’ve learnt in the first place.

Assume the following function compiles

#![allow(unused)]
fn main() {
use std::cell::Cell;
fn shortener<'cell, 'a>(cell: &'cell Cell<&'static str>, s: &'a str) -> &'cell Cell<&'a str> {
    cell.set(s);
    cell
}
}

Let’s infer the regions inside this uaf function:

#![allow(unused)]
fn main() {
fn uaf() {
    let cell = Cell::new("Static str");
    let s = String::new("UAF");

    let cell2 = shortener(&cell, &s);
    drop(s);
    println!("{}", cell2);

}
}

We need to infer 2 regions: 'cell and 'a following the last usage rule.

fn uaf() {
/--- cell region
|   let cell = Cell::new("Static str");
|   let s = String::new("UAF");
|
|/-- shortener 'cell region
||   let cell2 = shortener(&cell, &s);
||   drop(s);
||   println!("{}", cell2);
|-
-
}

The shortener 'cell holds a reference to cell and a cell2 output reference. &cell is a regular input reference without additional bounds, so the region for it is one line long, however cell2 is used in the print statement, so the region was expanded to be 3 lines long. cell region outlives the shortener 'cell region, so we can take a reference and dereference it at any line within the 'shortener cell region, there are no errors at this point.

fn uaf() {
    let cell = Cell::new("Static str");
/--- s region
|   let s = String::new("UAF");
|
|/-- shortener 'a region
||   let cell2 = shortener(&cell, &s);
||   drop(s);
-|   println!("{}", cell2);
 -

}

The shortener 'a region holds a reference to s and an internal reference inside the cell2(we can think it holds cell2 itself for simplicity). &s is a regular input reference, so shortener 'a with respect of &s is only one line long, however cell2 is used in the print statement which makes shortener 'a region 3 lines long. But we can’t dereference &s at the line with println! because the s region ends right before the println! statement. There is an error and compiler successfully caught a use after free bug. However, cell is still in scope, so instead of using cell2 we could have used cell in the print statement:

fn uaf() {
/--- cell region
|   let cell = Cell::new("Static str");
|/-- s region
||  let s = String::new("UAF");
||
||/- shortener 'cell region & shortener 'a regions
||| let cell2 = shortener(&cell, &s);
||-
||  drop(s);
|-
|   println!("{}", cell);
-
}

Now, as we don’t use cell2, shortener 'cell and shortener 'a regions are both only one line long, and we can safely take references to both cell and s at this line. But shortener updates the cell pointing the internal reference to the allocated on the heap string which is being dropped right before we’re printing it out on the screen. That’s a use after free bug and compiler is unable to prevent it because it analyzes functions independetly and can’t see that the cell was updated inside shortener. That’s why we need to disable the ability to shorten lifetimes for the Cell type. However, hardcoding types for which shortening rules don’t apply is a bad solution. We have the same issue for all types with interior mutability and programmers may define their own interiory mutable types + there may be other kinds of types vulnerable to this same issue. To control whether we allowed to shorten lifetimes or not there is a mechanism called lifetime variance.

Variance rules

Variance rules are hardcoded in the compiler for the primitive types and are being inferred for the compound types.

Threre are 3 of them:

  • A type is covariant if 'a: 'b implies T<'a>: T<'b>. This is what we’ve used in the previous chapter to cast Iterator + 'post_urls into Iterator + 'blog_url. Covariance means that the rules we’ve learnt before work and lifetime shortenings are allowed.
  • A type is invariant if 'a: 'b implies nothing. That’s what we’ve seen in the example with the Cell type. Basically it’s a mechanism to disable lifetime casts.
  • A type is contravariant if 'a: 'b implies T<'b>: T<'a>. This is a rule that allows to extend the lifeime 'b to the lifetime 'a. It works only for the function pointer arguments and it will be your homework to figure it out.

In practice you’ll usually deal with covariance and invariance.

Here is a table from the Nomicon with the variance settings for different types. As a general rule:

  • All const contexts are covariant
  • All mutable/interiory mutable contexts are invariant
  • Function pointer arguments are contravariant
Type’aTU
&'a Tcovariantcovariant
&'a mut Tcovariantinvariant
Box<T>covariant
Vec<T>covariant
UnsafeCell<T>invariant
Cell<T>invariant
fn(T) -> Ucontravariantcovariant
*const Tcovariant
*mut Tinvariant

It may be a bit confusing to see that variance is applied to a lifetime and some type T. That’s because T may be a reference itself (like &'s str). Let’s understand how these rules work with one more example:

#![allow(unused)]
fn main() {
struct S<'a, T> {
    val: &'a T
}

fn shortener<'a, 'b, 'c, 'd>(s: S<'a, &'b str>) -> S<'c, &'d str>
where
    'a: 'c,
    'b: 'd,
{
    s
}
}

We have a struct definiton corresponding to this row of the table

Type’aTU
&'a Tcovariantcovariant

The table shows that &'a T is covariant over 'a. That means that 'a: 'b implies 'a ~> 'b. We show it in our shortener funciton by shortening 'a to 'c. Also, &'a T is covariant over T. That means if T is a reference the lifetime of the reference is covariant. We show that in shortener by shortening &'b str to &'d str.

Let’s modify our example a bit

#![allow(unused)]
fn main() {
struct S<'a, T> {
    val: &'a mut T
}

fn shortener<'a, 'b, 'c, 'd>(s: S<'a, &'b str>) -> S<'c, &'d str>
where
    'a: 'c,
    'b: 'd,
{
    s
}
}

Not type S corresponds to this row of the table

Type’aTU
&'a mut Tcovariantinvariant

shortener no longer compiles. T is invrariant meaning 'b: 'd doesn’t allow 'b ~> 'd. However S is still covariant over 'a, so 'a: 'c should work. And indeed, if we remove 'b ~> 'd cast from the signature it compiles:

#![allow(unused)]
fn main() {
struct S<'a, T> {
    val: &'a mut T
}

fn shortener<'a, 'b, 'c>(s: S<'a, &'b str>) -> S<'c, &'b str>
where
    'a: 'c,
{
    s
}
}

This should be enough material to give you a basic undestanding of lifetime variance. In general, you want to prefer covariant contexts because they’re the most flexible ones. Invariant contexts usually lead to some hard to grasp lifetime errors because references are tightly bounded to their regions and it can be tricky to move them into another regions. We’ll see such errors and learn how to deal with them in another chapters. For now you need to get comfortable with the variance concept.

Chapter exercises

The chapter is called Introduction to variance because it only gives you a reasoning why this mechanism is needed and a brief overview of how it works. There is already an awesome Practical variance tutorial on the Internet. Complete it to master the variance concept.