100 exercises to learn Rust
This commit is contained in:
18
book/src/07_threads/00_intro.md
Normal file
18
book/src/07_threads/00_intro.md
Normal file
@@ -0,0 +1,18 @@
|
||||
# Intro
|
||||
|
||||
One of Rust's big promises is *fearless concurrency*: making it easier to write safe, concurrent programs.
|
||||
We haven't seen much of that yet. All the work we've done so far has been single-threaded:
|
||||
instructions executed one after the other, with strict sequencing. Time to change that!
|
||||
|
||||
In this chapter we'll make our ticket store multithreaded.
|
||||
We will start by allowing multiple users to interface with the same store at the same time. We'll then progress
|
||||
to having multiple instances of the store running concurrently while sharing the same data.
|
||||
|
||||
We'll have the opportunity to touch most of Rust's core concurrency features, including:
|
||||
|
||||
- Threads, using the `std::thread` module
|
||||
- Message passing, using channels
|
||||
- Shared state, using `Arc`, `Mutex` and `RwLock`
|
||||
- `Send` and `Sync`, the traits that encode Rust's concurrency guarantees
|
||||
|
||||
We'll also discuss various design patterns for multithreaded systems and some their trade-offs.
|
||||
115
book/src/07_threads/01_threads.md
Normal file
115
book/src/07_threads/01_threads.md
Normal file
@@ -0,0 +1,115 @@
|
||||
# Threads
|
||||
|
||||
Before we start writing multithreaded code, let's take a step back and talk about what threads are
|
||||
and why we might want to use them.
|
||||
|
||||
## What is a thread?
|
||||
|
||||
A **thread** is an execution context managed by the underlying operating system.
|
||||
Each thread has its own stack, instruction pointer, and program counter.
|
||||
|
||||
A single **process** can manage multiple threads.
|
||||
These threads share the same memory space, which means they can access the same data.
|
||||
|
||||
Threads are a **logical** construct. In the end, you can only run one set of instructions
|
||||
at a time on a CPU core, the **physical** execution unit.
|
||||
Since there can be many more threads than there are CPU cores, the operating system's
|
||||
**scheduler** is in charge of deciding which thread to run at any given time,
|
||||
partitioning CPU time among them to maximize throughput and responsiveness.
|
||||
|
||||
## `main`
|
||||
|
||||
When a Rust program starts, it runs on a single thread, the **main thread**.
|
||||
This thread is created by the operating system and is responsible for running the `main`
|
||||
function.
|
||||
|
||||
```rust
|
||||
use std::thread;
|
||||
use std::time::Duration;
|
||||
|
||||
fn main() {
|
||||
loop {
|
||||
thread::sleep(Duration::from_secs(2));
|
||||
println!("Hello from the main thread!");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## `std::thread`
|
||||
|
||||
Rust's standard library provides a module, `std::thread`, that allows you to create
|
||||
and manage threads.
|
||||
|
||||
### `spawn`
|
||||
|
||||
You can use `std::thread::spawn` to create new threads and execute code on them.
|
||||
|
||||
For example:
|
||||
|
||||
```rust
|
||||
use std::thread;
|
||||
use std::time::Duration;
|
||||
|
||||
fn main() {
|
||||
let handle = thread::spawn(|| {
|
||||
loop {
|
||||
thread::sleep(Duration::from_secs(1));
|
||||
println!("Hello from a thread!");
|
||||
}
|
||||
});
|
||||
|
||||
loop {
|
||||
thread::sleep(Duration::from_secs(2));
|
||||
println!("Hello from the main thread!");
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
If you execute this program on the [Rust playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=afedf7062298ca8f5a248bc551062eaa)
|
||||
you'll see that the main thread and the spawned thread run concurrently.
|
||||
Each thread makes progress independently of the other.
|
||||
|
||||
### Process termination
|
||||
|
||||
When the main thread finishes, the overall process will exit.
|
||||
A spawned thread will continue running until it finishes or the main thread finishes.
|
||||
|
||||
```rust
|
||||
use std::thread;
|
||||
use std::time::Duration;
|
||||
|
||||
fn main() {
|
||||
let handle = thread::spawn(|| {
|
||||
loop {
|
||||
thread::sleep(Duration::from_secs(1));
|
||||
println!("Hello from a thread!");
|
||||
}
|
||||
});
|
||||
|
||||
thread::sleep(Duration::from_secs(5));
|
||||
}
|
||||
```
|
||||
|
||||
In the example above, you can expect to see the message "Hello from a thread!" printed roughly five times.
|
||||
Then the main thread will finish (when the `sleep` call returns), and the spawned thread will be terminated
|
||||
since the overall process exits.
|
||||
|
||||
### `join`
|
||||
|
||||
You can also wait for a spawned thread to finish by calling the `join` method on the `JoinHandle` that `spawn` returns.
|
||||
|
||||
```rust
|
||||
use std::thread;
|
||||
fn main() {
|
||||
let handle = thread::spawn(|| {
|
||||
println!("Hello from a thread!");
|
||||
});
|
||||
|
||||
handle.join().unwrap();
|
||||
}
|
||||
```
|
||||
|
||||
In this example, the main thread will wait for the spawned thread to finish before exiting.
|
||||
This introduces a form of **synchronization** between the two threads: you're guaranteed to see the message
|
||||
"Hello from a thread!" printed before the program exits, because the main thread won't exit
|
||||
until the spawned thread has finished.
|
||||
112
book/src/07_threads/02_static.md
Normal file
112
book/src/07_threads/02_static.md
Normal file
@@ -0,0 +1,112 @@
|
||||
# `'static`
|
||||
|
||||
If you tried to borrow a slice from the vector in the previous exercise,
|
||||
you probably got a compiler error that looks something like this:
|
||||
|
||||
```text
|
||||
error[E0597]: `v` does not live long enough
|
||||
|
|
||||
11 | pub fn sum(v: Vec<i32>) -> i32 {
|
||||
| - binding `v` declared here
|
||||
...
|
||||
15 | let right = &v[split_point..];
|
||||
| ^ borrowed value does not live long enough
|
||||
16 | let left_handle = thread::spawn(move || left.iter().sum::<i32>());
|
||||
| ------------------------------------------------
|
||||
argument requires that `v` is borrowed for `'static`
|
||||
19 | }
|
||||
| - `v` dropped here while still borrowed
|
||||
```
|
||||
|
||||
`argument requires that v is borrowed for 'static`, what does that mean?
|
||||
|
||||
The `'static` lifetime is a special lifetime in Rust.
|
||||
It means that the value will be valid for the entire duration of the program.
|
||||
|
||||
## Detached threads
|
||||
|
||||
A thread launched via `thread::spawn` can **outlive** the thread that spawned it.
|
||||
For example:
|
||||
|
||||
```rust
|
||||
use std::thread;
|
||||
|
||||
fn f() {
|
||||
thread::spawn(|| {
|
||||
thread::spawn(|| {
|
||||
loop {
|
||||
thread::sleep(std::time::Duration::from_secs(1));
|
||||
println!("Hello from the detached thread!");
|
||||
}
|
||||
});
|
||||
});
|
||||
}
|
||||
```
|
||||
|
||||
In this example, the first spawned thread will in turn spawn
|
||||
a child thread that prints a message every second.
|
||||
The first thread will then finish and exit. When that happens,
|
||||
its child thread will **continue running** for as long as the
|
||||
overall process is running.
|
||||
In Rust's lingo, we say that the child thread has **outlived**
|
||||
its parent.
|
||||
|
||||
## `'static` lifetime
|
||||
|
||||
Since a spawned thread can:
|
||||
|
||||
- outlive the thread that spawned it (its parent thread)
|
||||
- run until the program exits
|
||||
|
||||
it must not borrow any values that might be dropped before the program exits;
|
||||
violating this constraint would expose us to a use-after-free bug.
|
||||
That's why `std::thread::spawn`'s signature requires that the closure passed to it
|
||||
has the `'static` lifetime:
|
||||
|
||||
```rust
|
||||
pub fn spawn<F, T>(f: F) -> JoinHandle<T>
|
||||
where
|
||||
F: FnOnce() -> T + Send + 'static,
|
||||
T: Send + 'static
|
||||
{
|
||||
// [..]
|
||||
}
|
||||
```
|
||||
|
||||
## `'static` is not (just) about references
|
||||
|
||||
All values in Rust have a lifetime, not just references.
|
||||
|
||||
In particular, a type that owns its data (like a `Vec` or a `String`)
|
||||
satisfies the `'static` constraint: if you own it, you can keep working with it
|
||||
for as long as you want, even after the function that originally created it
|
||||
has returned.
|
||||
|
||||
You can thus interpret `'static` as a way to say:
|
||||
|
||||
- Give me an owned value
|
||||
- Give me a reference that's valid for the entire duration of the program
|
||||
|
||||
The first approach is how you solved the issue in the previous exercise:
|
||||
by allocating new vectors to hold the left and right parts of the original vector,
|
||||
which were then moved into the spawned threads.
|
||||
|
||||
## `'static` references
|
||||
|
||||
Let's talk about the second case, references that are valid for the entire
|
||||
duration of the program.
|
||||
|
||||
### Static data
|
||||
|
||||
The most common case is a reference to **static data**, such as string literals:
|
||||
|
||||
```rust
|
||||
let s: &'static str = "Hello world!";
|
||||
```
|
||||
|
||||
Since string literals are known at compile-time, Rust's stores them in a memory
|
||||
region known as ***. *** is part of the executable itself: there is no risk of it
|
||||
being freed during program execution.
|
||||
All references pointing to that region will therefore be valid for as long as
|
||||
the program runs; they satisfy the `'static` contract.
|
||||
|
||||
46
book/src/07_threads/03_leak.md
Normal file
46
book/src/07_threads/03_leak.md
Normal file
@@ -0,0 +1,46 @@
|
||||
# Leaking data
|
||||
|
||||
The main concern around passing references to spawned threads is use-after-free bugs:
|
||||
accessing data using a pointer to a memory region that's already been freed/de-allocated.
|
||||
If you're working with heap-allocated data, you can avoid the issue by
|
||||
telling Rust that you'll never reclaim that memory: you choose to **leak memory**,
|
||||
intentionally.
|
||||
|
||||
This can be done, for example, using the `Box::leak` method from Rust's standard library:
|
||||
|
||||
```rust
|
||||
// Allocate a `u32` on the heap, by wrapping it in a `Box`.
|
||||
let x = Box::new(41u32);
|
||||
// Tell Rust that you'll never free that heap allocation
|
||||
// using `Box::leak`. You can thus get back a 'static reference.
|
||||
let static_ref: &'static mut u32 = Box::leak(x);
|
||||
```
|
||||
|
||||
## Data leakage is process-scoped
|
||||
|
||||
Leaking data is dangerous: if you keep leaking memory, you'll eventually
|
||||
run out and crash with an out-of-memory error.
|
||||
|
||||
```rust
|
||||
// If you leave this running for a while,
|
||||
// it'll eventually use all the available memory.
|
||||
fn oom_trigger() {
|
||||
loop {
|
||||
let v: Vec<usize> = Vec::with_capacity(1024);
|
||||
Box::leak(v);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
At the same time, memory leaked via `Box::leak` is not truly forgotten.
|
||||
The operating system can map each memory region to the process responsible for it.
|
||||
When the process exits, the operating system will reclaim that memory.
|
||||
|
||||
Keeping this in mind, it can be OK to leak memory when:
|
||||
|
||||
- The amount of memory you need to leak is not unbounded/known upfront, or
|
||||
- Your process is short-lived and you're confident you won't exhaust
|
||||
all the available memory before it exits
|
||||
|
||||
"Let the OS deal with it" is a perfectly valid memory management strategy
|
||||
if your usecase allows for it.
|
||||
73
book/src/07_threads/04_scoped_threads.md
Normal file
73
book/src/07_threads/04_scoped_threads.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# Scoped threads
|
||||
|
||||
All the lifetime issues we discussed so far have a common source:
|
||||
the spawned thread can outlive its parent.
|
||||
We can sidestep this issue by using **scoped threads**.
|
||||
|
||||
```rust
|
||||
let v = vec![1, 2, 3];
|
||||
let midpoint = v.len() / 2;
|
||||
|
||||
std::thread::scope(|scope| {
|
||||
scope.spawn(|| {
|
||||
let first = &v[..midpoint];
|
||||
println!("Here's the first half of v: {first:?}");
|
||||
});
|
||||
scope.spawn(|| {
|
||||
let second = &v[midpoint..];
|
||||
println!("Here's the second half of v: {second:?}");
|
||||
});
|
||||
});
|
||||
|
||||
println!("Here's v: {v:?}");
|
||||
```
|
||||
|
||||
Let's unpack what's happening.
|
||||
|
||||
## `scope`
|
||||
|
||||
The `std::thread::scope` function creates a new **scope**.
|
||||
`std::thread::scope` takes as input a closure, with a single argument: a `Scope` instance.
|
||||
|
||||
## Scoped spawns
|
||||
|
||||
`Scope` exposes a `spawn` method.
|
||||
Unlike `std::thread::spawn`, all threads spawned using a `Scope` will be
|
||||
**automatically joined** when the scope ends.
|
||||
|
||||
If we were to "translate" the previous example to `std::thread::spawn`,
|
||||
it'd look like this:
|
||||
|
||||
```rust
|
||||
let v = vec![1, 2, 3];
|
||||
let midpoint = v.len() / 2;
|
||||
|
||||
let handle1 = std::thread::spawn(|| {
|
||||
let first = &v[..midpoint];
|
||||
println!("Here's the first half of v: {first:?}");
|
||||
});
|
||||
let handle2 = std::thread::spawn(|| {
|
||||
let second = &v[midpoint..];
|
||||
println!("Here's the second half of v: {second:?}");
|
||||
});
|
||||
|
||||
handle1.join().unwrap();
|
||||
handle2.join().unwrap();
|
||||
|
||||
println!("Here's v: {v:?}");
|
||||
```
|
||||
|
||||
## Borrowing from the environment
|
||||
|
||||
The translated example wouldn't compile, though: the compiler would complain
|
||||
that `&v` can't be used from our spawned threads since its lifetime isn't
|
||||
`'static`.
|
||||
|
||||
That's not an issue with `std::thread::scope`—you can **safely borrow from the environment**.
|
||||
|
||||
In our example, `v` is created before the spawning points.
|
||||
It will only be dropped _after_ `scope` returns. At the same time,
|
||||
all threads spawned inside `scope` are guaranteed `v` is dropped,
|
||||
therefore there is no risk of having dangling references.
|
||||
|
||||
The compiler won't complain!
|
||||
73
book/src/07_threads/05_channels.md
Normal file
73
book/src/07_threads/05_channels.md
Normal file
@@ -0,0 +1,73 @@
|
||||
# Channels
|
||||
|
||||
All our spawned threads have been fairly short-lived so far.
|
||||
Get some input, run a computation, return the result, shut down.
|
||||
|
||||
For our ticket management system, we want to do something different:
|
||||
a client-server architecture.
|
||||
|
||||
We will have **one long-running server thread**, responsible for managing
|
||||
our state, the stored tickets.
|
||||
|
||||
We will then have **multiple client threads**.
|
||||
Each client will be able to send **commands** and **queries** to
|
||||
the stateful thread, in order to change its state (e.g. add a new ticket)
|
||||
or retrieve information (e.g. get the status of a ticket).
|
||||
Client threads will run concurrently.
|
||||
|
||||
## Communication
|
||||
|
||||
So far we've only had very limited parent-child communication:
|
||||
|
||||
- The spawned thread borrowed/consumed data from the parent context
|
||||
- The spawned thread returned data to the parent when joined
|
||||
|
||||
This isn't enough for a client-server design.
|
||||
Clients need to be able to send and receive data from the server thread
|
||||
_after_ it has been launched.
|
||||
|
||||
We can solve the issue using **channels**.
|
||||
|
||||
## Channels
|
||||
|
||||
Rust's standard library provides **multi-consumer, single-consumer** (mpsc) channels
|
||||
in its `std::sync::mpsc` module.
|
||||
There are two channel flavours: bounded and unbounded. We'll stick to the unbounded
|
||||
version for now, but we'll discuss the pros and cons later on.
|
||||
|
||||
Channel creation looks like this:
|
||||
|
||||
```rust
|
||||
use std::sync::mpsc::channel;
|
||||
|
||||
let (sender, receiver) = channel();
|
||||
```
|
||||
|
||||
You get a sender and a receiver.
|
||||
You call `send` on the sender to push data into the channel.
|
||||
You call `recv` on the receiver to pull data from the channel.
|
||||
|
||||
### Multiple senders
|
||||
|
||||
`Sender` is clonable: we can create multiple senders (e.g. one for
|
||||
each client thread) and they will all push data into the same channel.
|
||||
|
||||
`Receiver`, instead, is not clonable: there can only be a single receiver
|
||||
for a given channel.
|
||||
|
||||
That's what **mpsc** (multi-producer single-consumer) stands for!
|
||||
|
||||
### Message type
|
||||
|
||||
Both `Sender` and `Receiver` are generic over a type parameter `T`.
|
||||
That's the type of the _messages_ that can travel on our channel.
|
||||
|
||||
It could be a `u64`, a struct, an enum, etc.
|
||||
|
||||
### Errors
|
||||
|
||||
Both `send` and `recv` can fail.
|
||||
`send` returns an error if the receiver has been dropped.
|
||||
`recv` returns an error if all senders have been dropped and the channel is empty.
|
||||
|
||||
In other words, `send` and `recv` error when the channel is effectively closed.
|
||||
114
book/src/07_threads/06_interior_mutability.md
Normal file
114
book/src/07_threads/06_interior_mutability.md
Normal file
@@ -0,0 +1,114 @@
|
||||
# Interior mutability
|
||||
|
||||
Let's take a moment to reason about the signature of `Sender`'s `send`:
|
||||
|
||||
```rust
|
||||
impl<T> Sender<T> {
|
||||
pub fn send(&self, t: T) -> Result<(), SendError<T>> {
|
||||
// [...]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`send` takes `&self` as its argument.
|
||||
But it's clearly causing a mutation: it's adding a new message to the channel.
|
||||
What's even more interesting is that `Sender` is cloneable: we can have multiple instances of `Sender`
|
||||
trying to modify the channel state **at the same time**, from different threads.
|
||||
|
||||
That's the key property we are using to build this client-server architecture. But why does it work?
|
||||
Doesn't it violate Rust's rules about borrowing? How are we performing mutations via an _immutable_ reference?
|
||||
|
||||
## Shared rather than immutable references
|
||||
|
||||
When we introduced the borrow-checker, we named the two types of references we can have in Rust:
|
||||
|
||||
- immutable references (`&T`)
|
||||
- mutable references (`&mut T`)
|
||||
|
||||
It would have been more accurate to name them:
|
||||
|
||||
- shared references (`&T`)
|
||||
- exclusive references (`&mut T`)
|
||||
|
||||
Immutable/mutable is a mental model that works for the vast majority of cases, and it's a great one to get started
|
||||
with Rust. But it's not the whole story, as you've just seen: `&T` doesn't actually guarantee that the data it
|
||||
points to is immutable.
|
||||
Don't worry, though: Rust is still keeping its promises.
|
||||
It's just that the terms are a bit more nuanced than they might seem at first.
|
||||
|
||||
## `UnsafeCell`
|
||||
|
||||
Whenever a type allows you to mutate data through a shared reference, you're dealing with **interior mutability**.
|
||||
|
||||
By default, the Rust compiler assumes that shared references are immutable. It **optimises your code** based on that assumption.
|
||||
The compiler can reorder operations, cache values, and do all sorts of magic to make your code faster.
|
||||
|
||||
You can tell the compiler "No, this shared reference is actually mutable" by wrapping the data in an `UnsafeCell`.
|
||||
Every time you see a type that allows interior mutability, you can be certain that `UnsafeCell` is involved,
|
||||
either directly or indirectly.
|
||||
Using `UnsafeCell`, raw pointers and `unsafe` code, you can mutate data through shared references.
|
||||
|
||||
Let's be clear, though: `UnsafeCell` isn't a magic wand that allows you to ignore the borrow-checker!
|
||||
`unsafe` code is still subject to Rust's rules about borrowing and aliasing.
|
||||
It's an (advanced) tool that you can leverage to build **safe abstractions** whose safety can't be directly expressed
|
||||
in Rust's type system. Whenever you use the `unsafe` keyword you're telling the compiler:
|
||||
"I know what I'm doing, I won't violate your invariants, trust me."
|
||||
|
||||
Every time you call an `unsafe` function, there will be documentation explaining its **safety preconditions**:
|
||||
under what circumstances it's safe to execute its `unsafe` block. You can find the ones for `UnsafeCell`
|
||||
[in `std`'s documentation](https://doc.rust-lang.org/std/cell/struct.UnsafeCell.html).
|
||||
|
||||
We won't be using `UnsafeCell` directly in this course, nor will we be writing `unsafe` code.
|
||||
But it's important to know that it's there, why it exists and how it relates to the types you use
|
||||
every day in Rust.
|
||||
|
||||
## Key examples
|
||||
|
||||
Let's go through a couple of important `std` types that leverage interior mutability.
|
||||
These are types that you'll encounter somewhat often in Rust code, especially if you peek under the hood of
|
||||
some the libraries you use.
|
||||
|
||||
### Reference counting
|
||||
|
||||
`Rc` is a reference-counted pointer.
|
||||
It wraps around a value and keeps track of how many references to the value exist.
|
||||
When the last reference is dropped, the value is deallocated.
|
||||
The value wrapped in an `Rc` is immutable: you can only get shared references to it.
|
||||
|
||||
```rust
|
||||
use std::rc::Rc;
|
||||
|
||||
let a: Rc<String> = Rc::new("My string".to_string());
|
||||
// Only one reference to the string data exists.
|
||||
assert_eq!(Rc::strong_count(&a), 1);
|
||||
|
||||
// When we call `clone`, the string data is not copied!
|
||||
// Instead, the reference count for `Rc` is incremented.
|
||||
let b = Rc::clone(&a);
|
||||
assert_eq!(Rc::strong_count(&a), 2);
|
||||
assert_eq!(Rc::strong_count(&b), 2);
|
||||
// ^ Both `a` and `b` point to the same string data
|
||||
// and share the same reference counter.
|
||||
```
|
||||
|
||||
`Rc` uses `UnsafeCell` internally to allow shared references to increment and decrement the reference count.
|
||||
|
||||
### `RefCell`
|
||||
|
||||
`RefCell` is one of the most common examples of interior mutability in Rust.
|
||||
It allows you to mutate the value wrapped in a `RefCell` even if you only have an
|
||||
immutable reference to the `RefCell` itself.
|
||||
|
||||
This is done via **runtime borrow checking**.
|
||||
The `RefCell` keeps track of the number (and type) of references to the value it contains at runtime.
|
||||
If you try to borrow the value mutably while it's already borrowed immutably,
|
||||
the program will panic, ensuring that Rust's borrowing rules are always enforced.
|
||||
|
||||
```rust
|
||||
use std::cell::RefCell;
|
||||
|
||||
let x = RefCell::new(42);
|
||||
|
||||
let y = x.borrow(); // Immutable borrow
|
||||
let z = x.borrow_mut(); // Panics! There is an active immutable borrow.
|
||||
```
|
||||
16
book/src/07_threads/07_ack.md
Normal file
16
book/src/07_threads/07_ack.md
Normal file
@@ -0,0 +1,16 @@
|
||||
# Two-way communication
|
||||
|
||||
In our current client-server implementation, communication flows in one direction: from the client to the server.
|
||||
The client has no way of knowing if the server received the message, executed it successfully, or failed.
|
||||
That's not ideal.
|
||||
|
||||
To solve this issue, we can introduce a two-way communication system.
|
||||
|
||||
## Response channel
|
||||
|
||||
We need a way for the server to send a response back to the client.
|
||||
There are various ways to do this, but the simplest option is to include a `Sender` channel in
|
||||
the message that the client sends to the server. After processing the message, the server can use
|
||||
this channel to send a response back to the client.
|
||||
|
||||
This is a fairly common pattern in Rust applications built on top of message-passing primitives.
|
||||
8
book/src/07_threads/08_client.md
Normal file
8
book/src/07_threads/08_client.md
Normal file
@@ -0,0 +1,8 @@
|
||||
# A dedicated `Client` type
|
||||
|
||||
All the interactions from the client side have been fairly low-level: you have to
|
||||
manually create a response channel, build the command, send it to the server, and
|
||||
then call `recv` on the response channel to get the response.
|
||||
|
||||
This is a lot of boilerplate code that could be abstracted away, and that's
|
||||
exactly what we're going to do in this exercise.
|
||||
43
book/src/07_threads/09_bounded.md
Normal file
43
book/src/07_threads/09_bounded.md
Normal file
@@ -0,0 +1,43 @@
|
||||
# Bounded vs unbounded channels
|
||||
|
||||
So far we've been using unbounded channels.
|
||||
You can send as many messages as you want, and the channel will grow to accommodate them.
|
||||
In a multi-producer single-consumer scenario, this can be problematic: if the producers
|
||||
enqueues messages at a faster rate than the consumer can process them, the channel will
|
||||
keep growing, potentially consuming all available memory.
|
||||
|
||||
Our recommendation is to **never** use an unbounded channel in a production system.
|
||||
You should always enforce an upper limit on the number of messages that can be enqueued using a
|
||||
**bounded channel**.
|
||||
|
||||
## Bounded channels
|
||||
|
||||
A bounded channel has a fixed capacity.
|
||||
You can create one by calling `sync_channel` with a capacity greater than zero:
|
||||
|
||||
```rust
|
||||
use std::sync::mpsc::sync_channel;
|
||||
|
||||
let (sender, receiver) = sync_channel(10);
|
||||
```
|
||||
|
||||
`receiver` has the same type as before, `Receiver<T>`.
|
||||
`sender`, instead, is an instance of `SyncSender<T>`.
|
||||
|
||||
### Sending messages
|
||||
|
||||
You have two different methods to send messages through a `SyncSender`:
|
||||
|
||||
- `send`: if there is space in the channel, it will enqueue the message and return `Ok(())`.
|
||||
If the channel is full, it will block and wait until there is space available.
|
||||
- `try_send`: if there is space in the channel, it will enqueue the message and return `Ok(())`.
|
||||
If the channel is full, it will return `Err(TrySendError::Full(value))`, where `value` is the message that couldn't be sent.
|
||||
|
||||
Depending on your use case, you might want to use one or the other.
|
||||
|
||||
### Backpressure
|
||||
|
||||
The main advantage of using bounded channels is that they provide a form of **backpressure**.
|
||||
They force the producers to slow down if the consumer can't keep up.
|
||||
The backpressure can then propagate through the system, potentially affecting the whole architecture and
|
||||
preventing end users from overwhelming the system with requests.
|
||||
39
book/src/07_threads/10_patch.md
Normal file
39
book/src/07_threads/10_patch.md
Normal file
@@ -0,0 +1,39 @@
|
||||
# Update operations
|
||||
|
||||
So far we've implemented only insertion and retrieval operations.
|
||||
Let's see how we can expand the system to provide an update operation.
|
||||
|
||||
## Legacy updates
|
||||
|
||||
In the non-threaded version of the system, updates were fairly straightforward: `TicketStore` exposed a
|
||||
`get_mut` method that allowed the caller to obtain a mutable reference to a ticket, and then modify it.
|
||||
|
||||
## Multithreaded updates
|
||||
|
||||
The same strategy won't work in the current multi-threaded version,
|
||||
because the mutable reference would have to be sent over a channel. The borrow checker would
|
||||
stop us, because `&mut Ticket` doesn't satisfy the `'static` lifetime requirement of `SyncSender::send`.
|
||||
|
||||
There are a few ways to work around this limitation. We'll explore a few of them in the following exercises.
|
||||
|
||||
### Patching
|
||||
|
||||
We can't send a `&mut Ticket` over a channel, therefore we can't mutate on the client-side.
|
||||
Can we mutate on the server-side?
|
||||
|
||||
We can, if we tell the server what needs to be changed. In other words, if we send a **patch** to the server:
|
||||
|
||||
```rust
|
||||
struct TicketPatch {
|
||||
id: TicketId,
|
||||
title: Option<TicketTitle>,
|
||||
description: Option<TicketDescription>,
|
||||
status: Option<TicketStatus>,
|
||||
}
|
||||
```
|
||||
|
||||
The `id` field is mandatory, since it's required to identify the ticket that needs to be updated.
|
||||
All other fields are optional:
|
||||
|
||||
- If a field is `None`, it means that the field should not be changed.
|
||||
- If a field is `Some(value)`, it means that the field should be changed to `value`.
|
||||
222
book/src/07_threads/11_locks.md
Normal file
222
book/src/07_threads/11_locks.md
Normal file
@@ -0,0 +1,222 @@
|
||||
# Locks, `Send` and `Arc`
|
||||
|
||||
The patching strategy you just implemented has a major drawback: it's racy.
|
||||
If two clients send patches for the same ticket roughly at same time, the server will apply them in an arbitrary order.
|
||||
Whoever enqueues their patch last will overwrite the changes made by the other client.
|
||||
|
||||
## Version numbers
|
||||
|
||||
We could try to fix this by using a **version number**.
|
||||
Each ticket gets assigned a version number upon creation, set to `0`.
|
||||
Whenever a client sends a patch, they must include the current version number of the ticket alongside the
|
||||
desired changes. The server will only apply the patch if the version number matches the one it has stored.
|
||||
|
||||
In the scenario described above, the server would reject the second patch, because the version number would
|
||||
have been incremented by the first patch and thus wouldn't match the one sent by the second client.
|
||||
|
||||
This approach is fairly common in distributed systems (e.g. when client and servers don't share memory),
|
||||
and it is known as **optimistic concurrency control**.
|
||||
The idea is that most of the time, conflicts won't happen, so we can optimize for the common case.
|
||||
You know enough about Rust by now to implement this strategy on your own as a bonus exercise, if you want to.
|
||||
|
||||
## Locking
|
||||
|
||||
We can also fix the race condition by introducing a **lock**.
|
||||
Whenever a client wants to update a ticket, they must first acquire a lock on it. While the lock is active,
|
||||
no other client can modify the ticket.
|
||||
|
||||
Rust's standard library provides two different locking primitives: `Mutex<T>` and `RwLock<T>`.
|
||||
Let's start with `Mutex<T>`. It stands for **mut**ual **ex**clusion, and it's the simplest kind of lock:
|
||||
it allows only one thread to access the data, no matter if it's for reading or writing.
|
||||
|
||||
`Mutex<T>` wraps the data it protects, and it's therefore generic over the type of the data.
|
||||
You can't access the data directly: the type system forces you to acquire a lock first using either `Mutex::lock` or
|
||||
`Mutex::try_lock`. The former blocks until the lock is acquired, the latter returns immediately with an error if the lock
|
||||
can't be acquired.
|
||||
Both methods return a guard object that dereferences to the data, allowing you to modify it. The lock is released when
|
||||
the guard is dropped.
|
||||
|
||||
```rust
|
||||
use std::sync::Mutex;
|
||||
|
||||
// An integer protected by a mutex lock
|
||||
let lock = Mutex::new(0);
|
||||
|
||||
// Acquire a lock on the mutex
|
||||
let mut guard = lock.lock().unwrap();
|
||||
|
||||
// Modify the data through the guard,
|
||||
// leveraging its `Deref` implementation
|
||||
*guard += 1;
|
||||
|
||||
// The lock is released when `data` goes out of scope
|
||||
// This can be done explicitly by dropping the guard
|
||||
// or happen implicitly when the guard goes out of scope
|
||||
drop(guard)
|
||||
```
|
||||
|
||||
## Locking granularity
|
||||
|
||||
What should our `Mutex` wrap?
|
||||
The simplest option would be the wrap the entire `TicketStore` in a single `Mutex`.
|
||||
This would work, but it would severely limit the system's performance: you wouldn't be able to read tickets in parallel,
|
||||
because every read would have to wait for the lock to be released.
|
||||
This is known as **coarse-grained locking**.
|
||||
|
||||
It would be better to use **fine-grained locking**, where each ticket is protected by its own lock.
|
||||
This way, clients can keep working with tickets in parallel, as long as they aren't trying to access the same ticket.
|
||||
|
||||
```rust
|
||||
// The new structure, with a lock for each ticket
|
||||
struct TicketStore {
|
||||
tickets: BTreeMap<TicketId, Mutex<Ticket>>,
|
||||
}
|
||||
```
|
||||
|
||||
This approach is more efficient, but it has a downside: `TicketStore` has to become **aware** of the multithreaded
|
||||
nature of the system; up until now, `TicketStore` has been blissfully ignored the existence of threads.
|
||||
Let's go for it anyway.
|
||||
|
||||
## Who holds the lock?
|
||||
|
||||
For the whole scheme to work, the lock must be passed to the client that wants to modify the ticket.
|
||||
The client can then directly modify the ticket (as if they had a `&mut Ticket`) and release the lock when they're done.
|
||||
|
||||
This is a bit tricky.
|
||||
We can't send a `Mutex<Ticket>` over a channel, because `Mutex` is not `Clone` and
|
||||
we can't move it out of the `TicketStore`. Could we send the `MutexGuard` instead?
|
||||
|
||||
Let's test the idea with a small example:
|
||||
|
||||
```rust
|
||||
use std::thread::spawn;
|
||||
use std::sync::Mutex;
|
||||
use std::sync::mpsc::sync_channel;
|
||||
|
||||
fn main() {
|
||||
let lock = Mutex::new(0);
|
||||
let (sender, receiver) = sync_channel(1);
|
||||
let guard = lock.lock().unwrap();
|
||||
|
||||
spawn(move || {
|
||||
receiver.recv().unwrap();;
|
||||
});
|
||||
|
||||
// Try to send the guard over the channel
|
||||
// to another thread
|
||||
sender.send(guard);
|
||||
}
|
||||
```
|
||||
|
||||
The compiler is not happy with this code:
|
||||
|
||||
```text
|
||||
error[E0277]: `MutexGuard<'_, i32>` cannot be sent between threads safely
|
||||
--> src/main.rs:10:7
|
||||
|
|
||||
10 | spawn(move || {
|
||||
| _-----_^
|
||||
| | |
|
||||
| | required by a bound introduced by this call
|
||||
11 | | receiver.recv().unwrap();;
|
||||
12 | | });
|
||||
| |_^ `MutexGuard<'_, i32>` cannot be sent between threads safely
|
||||
|
|
||||
= help: the trait `Send` is not implemented for `MutexGuard<'_, i32>`, which is required by `{closure@src/main.rs:10:7: 10:14}: Send`
|
||||
= note: required for `std::sync::mpsc::Receiver<MutexGuard<'_, i32>>` to implement `Send`
|
||||
note: required because it's used within this closure
|
||||
```
|
||||
|
||||
`MutexGuard<'_, i32>` is not `Send`: what does it mean?
|
||||
|
||||
## `Send`
|
||||
|
||||
`Send` is a marker trait that indicates that a type can be safely transferred from one thread to another.
|
||||
`Send` is also an auto-trait, just like `Sized`; it's automatically implemented (or not implemented) for your type
|
||||
by the compiler, based on its definition.
|
||||
You can also implement `Send` manually for your types, but it requires `unsafe` since you have to guarantee that the
|
||||
type is indeed safe to send between threads for reasons that the compiler can't automatically verify.
|
||||
|
||||
### Channel requirements
|
||||
|
||||
`Sender<T>`, `SyncSender<T>` and `Receiver<T>` are `Send` if and only if `T` is `Send`.
|
||||
That's because they are used to send values between threads, and if the value itself is not `Send`, it would be
|
||||
unsafe to send it between threads.
|
||||
|
||||
### `MutexGuard`
|
||||
|
||||
`MutexGuard` is not `Send` because the underlying operating system primitives that `Mutex` uses to implement
|
||||
the lock require (on some platforms) that the lock must be released by the same thread that acquired it.
|
||||
If we were to send a `MutexGuard` to another thread, the lock would be released by a different thread, which would
|
||||
lead to undefined behavior.
|
||||
|
||||
## Our challenges
|
||||
|
||||
Summing it up:
|
||||
|
||||
- We can't send a `MutexGuard` over a channel. So we can't lock on the server-side and then modify the ticket on the
|
||||
client-side.
|
||||
- We can send a `Mutex` over a channel because it's `Send` as long as the data it protects is `Send`, which is the
|
||||
case for `Ticket`.
|
||||
At the same time, we can't move the `Mutex` out of the `TicketStore` nor clone it.
|
||||
|
||||
How can we solve this conundrum?
|
||||
We need to look at the problem from a different angle.
|
||||
To lock a `Mutex`, we don't need an owned value. A shared reference is enough, since `Mutex` uses internal mutability:
|
||||
|
||||
```rust
|
||||
impl<T> Mutex<T> {
|
||||
// `&self`, not `self`!
|
||||
pub fn lock(&self) -> LockResult<MutexGuard<'_, T>> {
|
||||
// Implementation details
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
It is therefore enough to send a shared reference to the client.
|
||||
We can't do that directly, though, because the reference would have to be `'static` and that's not the case.
|
||||
In a way, we need an "owned shared reference". It turns out that Rust has a type that fits the bill: `Arc`.
|
||||
|
||||
## `Arc` to the rescue
|
||||
|
||||
`Arc` stands for **atomic reference counting**.
|
||||
`Arc` wraps around a value and keeps track of how many references to the value exist.
|
||||
When the last reference is dropped, the value is deallocated.
|
||||
The value wrapped in an `Arc` is immutable: you can only get shared references to it.
|
||||
|
||||
```rust
|
||||
use std::sync::Arc;
|
||||
|
||||
let data: Arc<u32> = Arc::new(0);
|
||||
let data_clone = Arc::clone(&data);
|
||||
|
||||
// `Arc<T>` implements `Deref<T>`, so can convert
|
||||
// a `&Arc<T>` to a `&T` using deref coercion
|
||||
let data_ref: &u32 = &data;
|
||||
```
|
||||
|
||||
If you're having a déjà vu moment, you're right: `Arc` sounds very similar to `Rc`, the reference-counted pointer we
|
||||
introduced when talking about interior mutability. The difference is thread-safety: `Rc` is not `Send`, while `Arc` is.
|
||||
It boils down to the way the reference count is implemented: `Rc` uses a "normal" integer, while `Arc` uses an
|
||||
**atomic** integer, which can be safely shared and modified across threads.
|
||||
|
||||
## `Arc<Mutex<T>>`
|
||||
|
||||
If we pair `Arc` with `Mutex`, we finally get a type that:
|
||||
|
||||
- Can be sent between threads, because:
|
||||
- `Arc` is `Send` if `T` is `Send`, and
|
||||
- `Mutex` is `Send` if `T` is `Send`.
|
||||
- `T` is `Ticket`, which is `Send`.
|
||||
- Can be cloned, because `Arc` is `Clone` no matter what `T` is.
|
||||
Cloning an `Arc` increments the reference count, the data is not copied.
|
||||
- Can be used to modify the data it wraps, because `Arc` lets you get a shared
|
||||
reference to `Mutex<T>` which can in turn be used to acquire a lock.
|
||||
|
||||
We have all the pieces we need to implement the locking strategy for our ticket store.
|
||||
|
||||
## Further reading
|
||||
|
||||
- We won't be covering the details of atomic operations in this course, but you can find more information
|
||||
[in the `std` documentation](https://doc.rust-lang.org/std/sync/atomic/index.html) as well as in the
|
||||
["Rust atomics and locks" book](https://marabos.nl/atomics/).
|
||||
45
book/src/07_threads/12_rw_lock.md
Normal file
45
book/src/07_threads/12_rw_lock.md
Normal file
@@ -0,0 +1,45 @@
|
||||
# Readers and writers
|
||||
|
||||
Our new `TicketStore` works, but its read performance is not great: there can only be one client at a time
|
||||
reading a specific ticket, because `Mutex<T>` doesn't distinguish between readers and writers.
|
||||
|
||||
We can solve the issue by using a different locking primitive: `RwLock<T>`.
|
||||
`RwLock<T>` stands for **read-write lock**. It allows **multiple readers** to access the data simultaneously,
|
||||
but only one writer at a time.
|
||||
|
||||
`RwLock<T>` has two methods to acquire a lock: `read` and `write`.
|
||||
`read` returns a guard that allows you to read the data, while `write` returns a guard that allows you to modify it.
|
||||
|
||||
```rust
|
||||
use std::sync::RwLock;
|
||||
|
||||
// An integer protected by a read-write lock
|
||||
let lock = RwLock::new(0);
|
||||
|
||||
// Acquire a read lock on the RwLock
|
||||
let guard1 = lock.read().unwrap();
|
||||
|
||||
// Acquire a **second** read lock
|
||||
// while the first one is still active
|
||||
let guard2 = lock.read().unwrap();
|
||||
```
|
||||
|
||||
## Trade-offs
|
||||
|
||||
On the surface, `RwLock<T>` seems like a no-brainer: it provides a superset of the functionality of `Mutex<T>`.
|
||||
Why would you ever use `Mutex<T>` if you can use `RwLock<T>` instead?
|
||||
|
||||
There are two key reasons:
|
||||
|
||||
- Locking a `RwLock<T>` is more expensive than locking a `Mutex<T>`.
|
||||
This is because `RwLock<T>` has to keep track of the number of active readers and writers, while `Mutex<T>`
|
||||
only has to keep track of whether the lock is held or not.
|
||||
This performance overhead is not an issue if there are more readers than writers, but if the workload
|
||||
is write-heavy `Mutex<T>` might be a better choice.
|
||||
- `RwLock<T>` can cause **writer starvation**.
|
||||
If there are always readers waiting to acquire the lock, writers might never get a chance to run.
|
||||
`RwLock<T>` doesn't provide any guarantees about the order in which readers and writers are granted access to the lock.
|
||||
It depends on the policy implemented by the underlying OS, which might not be fair to writers.
|
||||
|
||||
In our case, we can expect the workload to be read-heavy (since most clients will be reading tickets, not modifying them),
|
||||
so `RwLock<T>` is a good choice.
|
||||
54
book/src/07_threads/13_without_channels.md
Normal file
54
book/src/07_threads/13_without_channels.md
Normal file
@@ -0,0 +1,54 @@
|
||||
# Design review
|
||||
|
||||
Let's take a moment to review the journey we've been through.
|
||||
|
||||
## Lockless with channel serialization
|
||||
|
||||
Our first implementation of a multithreaded ticket store used:
|
||||
|
||||
- a single long-lived thread (server), to hold the shared state
|
||||
- multiple clients sending requests to it via channels from their own threads.
|
||||
|
||||
No locking of the state was necessary, since the server was the only one modifying the state. That's because
|
||||
the "inbox" channel naturally **serialized** incoming requests: the server would process them one by one.
|
||||
We've already discussed the limitations of this approach when it comes to patching behaviour, but we didn't
|
||||
discuss the performance implications of the original design: the server could only process one request at a time,
|
||||
including reads.
|
||||
|
||||
## Fine-grained locking
|
||||
|
||||
We then moved to a more sophisticated design, where each ticket was protected by its own lock and
|
||||
clients could independently decide if they wanted to read or atomically modify a ticket, acquiring the appropriate lock.
|
||||
|
||||
This design allows for better parallelism (i.e. multiple clients can read tickets at the same time), but it is
|
||||
still fundamentally **serial**: the server processes commands one by one. In particular, it hands out locks to clients
|
||||
one by one.
|
||||
|
||||
Could we remove the channels entirely and allow clients to directly access the `TicketStore`, relying exclusively on
|
||||
locks to synchronize access?
|
||||
|
||||
## Removing channels
|
||||
|
||||
We have two problems to solve:
|
||||
|
||||
- Sharing `TicketStore` across threads
|
||||
- Synchronizing access to the store
|
||||
|
||||
### Sharing `TicketStore` across threads
|
||||
|
||||
We want all threads to refer to the same state, otherwise we don't really have a multithreaded system—we're just
|
||||
running multiple single-threaded systems in parallel.
|
||||
We've already encountered this problem when we tried to share a lock across threads: we can use an `Arc`.
|
||||
|
||||
### Synchronizing access to the store
|
||||
|
||||
There is one interaction that's still lockless thanks to the serialization provided by the channels: inserting
|
||||
(or removing) a ticket from the store.
|
||||
If we remove the channels, we need to introduce (another) lock to synchronize access to the `TicketStore` itself.
|
||||
|
||||
If we use a `Mutex`, then it makes no sense to use an additional `RwLock` for each ticket: the `Mutex` will
|
||||
already serialize access to the entire store, so we wouldn't be able to read tickets in parallel anyway.
|
||||
If we use a `RwLock`, instead, we can read tickets in parallel. We just to pause all reads while inserting
|
||||
or removing a ticket.
|
||||
|
||||
Let's go down this path and see where it leads us.
|
||||
28
book/src/07_threads/14_sync.md
Normal file
28
book/src/07_threads/14_sync.md
Normal file
@@ -0,0 +1,28 @@
|
||||
# `Sync`
|
||||
|
||||
Before we wrap up this chapter, let's talk about another key trait in Rust's standard library: `Sync`.
|
||||
|
||||
`Sync` is an auto trait, just like `Send`.
|
||||
It is automatically implemented by all types that can be safely **shared** between threads.
|
||||
|
||||
In order words: `T: Sync` means that `&T` is `Send`.
|
||||
|
||||
## `Sync` doesn't imply `Send`
|
||||
|
||||
It's important to note that `Sync` doesn't imply `Send`.
|
||||
For example: `MutexGuard` is not `Send`, but it is `Sync`.
|
||||
|
||||
It isn't `Send` because the lock must be released on the same thread that acquired it, therefore we don't
|
||||
want `MutexGuard` to be dropped on a different thread.
|
||||
But it is `Sync`, because that has no impact on where the lock is released.
|
||||
|
||||
## `Send` doesn't imply `Sync`
|
||||
|
||||
The opposite is also true: `Send` doesn't imply `Sync`.
|
||||
For example: `RefCell<T>` is `Send` (if `T` is `Send`), but it is not `Sync`.
|
||||
|
||||
`RefCell<T>` performs runtime borrow checking, but the counters it uses to track borrows are not thread-safe.
|
||||
Therefore, having multiple threads holding a `&RefCell` would lead to a data race, with potentially
|
||||
multiple threads obtaining mutable references to the same data. Hence `RefCell` is not `Sync`.
|
||||
`Send` is fine, instead, because when we send a `RefCell` to another thread we're not
|
||||
leaving behind any references to the data it contains, hence no risk of concurrent mutable access.
|
||||
Reference in New Issue
Block a user