100 exercises to learn Rust

This commit is contained in:
LukeMathWalker
2024-05-12 22:21:03 +02:00
commit 5edebf6cf2
309 changed files with 13173 additions and 0 deletions

View File

@@ -0,0 +1,18 @@
# Intro
One of Rust's big promises is *fearless concurrency*: making it easier to write safe, concurrent programs.
We haven't seen much of that yet. All the work we've done so far has been single-threaded:
instructions executed one after the other, with strict sequencing. Time to change that!
In this chapter we'll make our ticket store multithreaded.
We will start by allowing multiple users to interface with the same store at the same time. We'll then progress
to having multiple instances of the store running concurrently while sharing the same data.
We'll have the opportunity to touch most of Rust's core concurrency features, including:
- Threads, using the `std::thread` module
- Message passing, using channels
- Shared state, using `Arc`, `Mutex` and `RwLock`
- `Send` and `Sync`, the traits that encode Rust's concurrency guarantees
We'll also discuss various design patterns for multithreaded systems and some their trade-offs.

View File

@@ -0,0 +1,115 @@
# Threads
Before we start writing multithreaded code, let's take a step back and talk about what threads are
and why we might want to use them.
## What is a thread?
A **thread** is an execution context managed by the underlying operating system.
Each thread has its own stack, instruction pointer, and program counter.
A single **process** can manage multiple threads.
These threads share the same memory space, which means they can access the same data.
Threads are a **logical** construct. In the end, you can only run one set of instructions
at a time on a CPU core, the **physical** execution unit.
Since there can be many more threads than there are CPU cores, the operating system's
**scheduler** is in charge of deciding which thread to run at any given time,
partitioning CPU time among them to maximize throughput and responsiveness.
## `main`
When a Rust program starts, it runs on a single thread, the **main thread**.
This thread is created by the operating system and is responsible for running the `main`
function.
```rust
use std::thread;
use std::time::Duration;
fn main() {
loop {
thread::sleep(Duration::from_secs(2));
println!("Hello from the main thread!");
}
}
```
## `std::thread`
Rust's standard library provides a module, `std::thread`, that allows you to create
and manage threads.
### `spawn`
You can use `std::thread::spawn` to create new threads and execute code on them.
For example:
```rust
use std::thread;
use std::time::Duration;
fn main() {
let handle = thread::spawn(|| {
loop {
thread::sleep(Duration::from_secs(1));
println!("Hello from a thread!");
}
});
loop {
thread::sleep(Duration::from_secs(2));
println!("Hello from the main thread!");
}
}
```
If you execute this program on the [Rust playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=afedf7062298ca8f5a248bc551062eaa)
you'll see that the main thread and the spawned thread run concurrently.
Each thread makes progress independently of the other.
### Process termination
When the main thread finishes, the overall process will exit.
A spawned thread will continue running until it finishes or the main thread finishes.
```rust
use std::thread;
use std::time::Duration;
fn main() {
let handle = thread::spawn(|| {
loop {
thread::sleep(Duration::from_secs(1));
println!("Hello from a thread!");
}
});
thread::sleep(Duration::from_secs(5));
}
```
In the example above, you can expect to see the message "Hello from a thread!" printed roughly five times.
Then the main thread will finish (when the `sleep` call returns), and the spawned thread will be terminated
since the overall process exits.
### `join`
You can also wait for a spawned thread to finish by calling the `join` method on the `JoinHandle` that `spawn` returns.
```rust
use std::thread;
fn main() {
let handle = thread::spawn(|| {
println!("Hello from a thread!");
});
handle.join().unwrap();
}
```
In this example, the main thread will wait for the spawned thread to finish before exiting.
This introduces a form of **synchronization** between the two threads: you're guaranteed to see the message
"Hello from a thread!" printed before the program exits, because the main thread won't exit
until the spawned thread has finished.

View File

@@ -0,0 +1,112 @@
# `'static`
If you tried to borrow a slice from the vector in the previous exercise,
you probably got a compiler error that looks something like this:
```text
error[E0597]: `v` does not live long enough
|
11 | pub fn sum(v: Vec<i32>) -> i32 {
| - binding `v` declared here
...
15 | let right = &v[split_point..];
| ^ borrowed value does not live long enough
16 | let left_handle = thread::spawn(move || left.iter().sum::<i32>());
| ------------------------------------------------
argument requires that `v` is borrowed for `'static`
19 | }
| - `v` dropped here while still borrowed
```
`argument requires that v is borrowed for 'static`, what does that mean?
The `'static` lifetime is a special lifetime in Rust.
It means that the value will be valid for the entire duration of the program.
## Detached threads
A thread launched via `thread::spawn` can **outlive** the thread that spawned it.
For example:
```rust
use std::thread;
fn f() {
thread::spawn(|| {
thread::spawn(|| {
loop {
thread::sleep(std::time::Duration::from_secs(1));
println!("Hello from the detached thread!");
}
});
});
}
```
In this example, the first spawned thread will in turn spawn
a child thread that prints a message every second.
The first thread will then finish and exit. When that happens,
its child thread will **continue running** for as long as the
overall process is running.
In Rust's lingo, we say that the child thread has **outlived**
its parent.
## `'static` lifetime
Since a spawned thread can:
- outlive the thread that spawned it (its parent thread)
- run until the program exits
it must not borrow any values that might be dropped before the program exits;
violating this constraint would expose us to a use-after-free bug.
That's why `std::thread::spawn`'s signature requires that the closure passed to it
has the `'static` lifetime:
```rust
pub fn spawn<F, T>(f: F) -> JoinHandle<T>
where
F: FnOnce() -> T + Send + 'static,
T: Send + 'static
{
// [..]
}
```
## `'static` is not (just) about references
All values in Rust have a lifetime, not just references.
In particular, a type that owns its data (like a `Vec` or a `String`)
satisfies the `'static` constraint: if you own it, you can keep working with it
for as long as you want, even after the function that originally created it
has returned.
You can thus interpret `'static` as a way to say:
- Give me an owned value
- Give me a reference that's valid for the entire duration of the program
The first approach is how you solved the issue in the previous exercise:
by allocating new vectors to hold the left and right parts of the original vector,
which were then moved into the spawned threads.
## `'static` references
Let's talk about the second case, references that are valid for the entire
duration of the program.
### Static data
The most common case is a reference to **static data**, such as string literals:
```rust
let s: &'static str = "Hello world!";
```
Since string literals are known at compile-time, Rust's stores them in a memory
region known as ***. *** is part of the executable itself: there is no risk of it
being freed during program execution.
All references pointing to that region will therefore be valid for as long as
the program runs; they satisfy the `'static` contract.

View File

@@ -0,0 +1,46 @@
# Leaking data
The main concern around passing references to spawned threads is use-after-free bugs:
accessing data using a pointer to a memory region that's already been freed/de-allocated.
If you're working with heap-allocated data, you can avoid the issue by
telling Rust that you'll never reclaim that memory: you choose to **leak memory**,
intentionally.
This can be done, for example, using the `Box::leak` method from Rust's standard library:
```rust
// Allocate a `u32` on the heap, by wrapping it in a `Box`.
let x = Box::new(41u32);
// Tell Rust that you'll never free that heap allocation
// using `Box::leak`. You can thus get back a 'static reference.
let static_ref: &'static mut u32 = Box::leak(x);
```
## Data leakage is process-scoped
Leaking data is dangerous: if you keep leaking memory, you'll eventually
run out and crash with an out-of-memory error.
```rust
// If you leave this running for a while,
// it'll eventually use all the available memory.
fn oom_trigger() {
loop {
let v: Vec<usize> = Vec::with_capacity(1024);
Box::leak(v);
}
}
```
At the same time, memory leaked via `Box::leak` is not truly forgotten.
The operating system can map each memory region to the process responsible for it.
When the process exits, the operating system will reclaim that memory.
Keeping this in mind, it can be OK to leak memory when:
- The amount of memory you need to leak is not unbounded/known upfront, or
- Your process is short-lived and you're confident you won't exhaust
all the available memory before it exits
"Let the OS deal with it" is a perfectly valid memory management strategy
if your usecase allows for it.

View File

@@ -0,0 +1,73 @@
# Scoped threads
All the lifetime issues we discussed so far have a common source:
the spawned thread can outlive its parent.
We can sidestep this issue by using **scoped threads**.
```rust
let v = vec![1, 2, 3];
let midpoint = v.len() / 2;
std::thread::scope(|scope| {
scope.spawn(|| {
let first = &v[..midpoint];
println!("Here's the first half of v: {first:?}");
});
scope.spawn(|| {
let second = &v[midpoint..];
println!("Here's the second half of v: {second:?}");
});
});
println!("Here's v: {v:?}");
```
Let's unpack what's happening.
## `scope`
The `std::thread::scope` function creates a new **scope**.
`std::thread::scope` takes as input a closure, with a single argument: a `Scope` instance.
## Scoped spawns
`Scope` exposes a `spawn` method.
Unlike `std::thread::spawn`, all threads spawned using a `Scope` will be
**automatically joined** when the scope ends.
If we were to "translate" the previous example to `std::thread::spawn`,
it'd look like this:
```rust
let v = vec![1, 2, 3];
let midpoint = v.len() / 2;
let handle1 = std::thread::spawn(|| {
let first = &v[..midpoint];
println!("Here's the first half of v: {first:?}");
});
let handle2 = std::thread::spawn(|| {
let second = &v[midpoint..];
println!("Here's the second half of v: {second:?}");
});
handle1.join().unwrap();
handle2.join().unwrap();
println!("Here's v: {v:?}");
```
## Borrowing from the environment
The translated example wouldn't compile, though: the compiler would complain
that `&v` can't be used from our spawned threads since its lifetime isn't
`'static`.
That's not an issue with `std::thread::scope`—you can **safely borrow from the environment**.
In our example, `v` is created before the spawning points.
It will only be dropped _after_ `scope` returns. At the same time,
all threads spawned inside `scope` are guaranteed `v` is dropped,
therefore there is no risk of having dangling references.
The compiler won't complain!

View File

@@ -0,0 +1,73 @@
# Channels
All our spawned threads have been fairly short-lived so far.
Get some input, run a computation, return the result, shut down.
For our ticket management system, we want to do something different:
a client-server architecture.
We will have **one long-running server thread**, responsible for managing
our state, the stored tickets.
We will then have **multiple client threads**.
Each client will be able to send **commands** and **queries** to
the stateful thread, in order to change its state (e.g. add a new ticket)
or retrieve information (e.g. get the status of a ticket).
Client threads will run concurrently.
## Communication
So far we've only had very limited parent-child communication:
- The spawned thread borrowed/consumed data from the parent context
- The spawned thread returned data to the parent when joined
This isn't enough for a client-server design.
Clients need to be able to send and receive data from the server thread
_after_ it has been launched.
We can solve the issue using **channels**.
## Channels
Rust's standard library provides **multi-consumer, single-consumer** (mpsc) channels
in its `std::sync::mpsc` module.
There are two channel flavours: bounded and unbounded. We'll stick to the unbounded
version for now, but we'll discuss the pros and cons later on.
Channel creation looks like this:
```rust
use std::sync::mpsc::channel;
let (sender, receiver) = channel();
```
You get a sender and a receiver.
You call `send` on the sender to push data into the channel.
You call `recv` on the receiver to pull data from the channel.
### Multiple senders
`Sender` is clonable: we can create multiple senders (e.g. one for
each client thread) and they will all push data into the same channel.
`Receiver`, instead, is not clonable: there can only be a single receiver
for a given channel.
That's what **mpsc** (multi-producer single-consumer) stands for!
### Message type
Both `Sender` and `Receiver` are generic over a type parameter `T`.
That's the type of the _messages_ that can travel on our channel.
It could be a `u64`, a struct, an enum, etc.
### Errors
Both `send` and `recv` can fail.
`send` returns an error if the receiver has been dropped.
`recv` returns an error if all senders have been dropped and the channel is empty.
In other words, `send` and `recv` error when the channel is effectively closed.

View File

@@ -0,0 +1,114 @@
# Interior mutability
Let's take a moment to reason about the signature of `Sender`'s `send`:
```rust
impl<T> Sender<T> {
pub fn send(&self, t: T) -> Result<(), SendError<T>> {
// [...]
}
}
```
`send` takes `&self` as its argument.
But it's clearly causing a mutation: it's adding a new message to the channel.
What's even more interesting is that `Sender` is cloneable: we can have multiple instances of `Sender`
trying to modify the channel state **at the same time**, from different threads.
That's the key property we are using to build this client-server architecture. But why does it work?
Doesn't it violate Rust's rules about borrowing? How are we performing mutations via an _immutable_ reference?
## Shared rather than immutable references
When we introduced the borrow-checker, we named the two types of references we can have in Rust:
- immutable references (`&T`)
- mutable references (`&mut T`)
It would have been more accurate to name them:
- shared references (`&T`)
- exclusive references (`&mut T`)
Immutable/mutable is a mental model that works for the vast majority of cases, and it's a great one to get started
with Rust. But it's not the whole story, as you've just seen: `&T` doesn't actually guarantee that the data it
points to is immutable.
Don't worry, though: Rust is still keeping its promises.
It's just that the terms are a bit more nuanced than they might seem at first.
## `UnsafeCell`
Whenever a type allows you to mutate data through a shared reference, you're dealing with **interior mutability**.
By default, the Rust compiler assumes that shared references are immutable. It **optimises your code** based on that assumption.
The compiler can reorder operations, cache values, and do all sorts of magic to make your code faster.
You can tell the compiler "No, this shared reference is actually mutable" by wrapping the data in an `UnsafeCell`.
Every time you see a type that allows interior mutability, you can be certain that `UnsafeCell` is involved,
either directly or indirectly.
Using `UnsafeCell`, raw pointers and `unsafe` code, you can mutate data through shared references.
Let's be clear, though: `UnsafeCell` isn't a magic wand that allows you to ignore the borrow-checker!
`unsafe` code is still subject to Rust's rules about borrowing and aliasing.
It's an (advanced) tool that you can leverage to build **safe abstractions** whose safety can't be directly expressed
in Rust's type system. Whenever you use the `unsafe` keyword you're telling the compiler:
"I know what I'm doing, I won't violate your invariants, trust me."
Every time you call an `unsafe` function, there will be documentation explaining its **safety preconditions**:
under what circumstances it's safe to execute its `unsafe` block. You can find the ones for `UnsafeCell`
[in `std`'s documentation](https://doc.rust-lang.org/std/cell/struct.UnsafeCell.html).
We won't be using `UnsafeCell` directly in this course, nor will we be writing `unsafe` code.
But it's important to know that it's there, why it exists and how it relates to the types you use
every day in Rust.
## Key examples
Let's go through a couple of important `std` types that leverage interior mutability.
These are types that you'll encounter somewhat often in Rust code, especially if you peek under the hood of
some the libraries you use.
### Reference counting
`Rc` is a reference-counted pointer.
It wraps around a value and keeps track of how many references to the value exist.
When the last reference is dropped, the value is deallocated.
The value wrapped in an `Rc` is immutable: you can only get shared references to it.
```rust
use std::rc::Rc;
let a: Rc<String> = Rc::new("My string".to_string());
// Only one reference to the string data exists.
assert_eq!(Rc::strong_count(&a), 1);
// When we call `clone`, the string data is not copied!
// Instead, the reference count for `Rc` is incremented.
let b = Rc::clone(&a);
assert_eq!(Rc::strong_count(&a), 2);
assert_eq!(Rc::strong_count(&b), 2);
// ^ Both `a` and `b` point to the same string data
// and share the same reference counter.
```
`Rc` uses `UnsafeCell` internally to allow shared references to increment and decrement the reference count.
### `RefCell`
`RefCell` is one of the most common examples of interior mutability in Rust.
It allows you to mutate the value wrapped in a `RefCell` even if you only have an
immutable reference to the `RefCell` itself.
This is done via **runtime borrow checking**.
The `RefCell` keeps track of the number (and type) of references to the value it contains at runtime.
If you try to borrow the value mutably while it's already borrowed immutably,
the program will panic, ensuring that Rust's borrowing rules are always enforced.
```rust
use std::cell::RefCell;
let x = RefCell::new(42);
let y = x.borrow(); // Immutable borrow
let z = x.borrow_mut(); // Panics! There is an active immutable borrow.
```

View File

@@ -0,0 +1,16 @@
# Two-way communication
In our current client-server implementation, communication flows in one direction: from the client to the server.
The client has no way of knowing if the server received the message, executed it successfully, or failed.
That's not ideal.
To solve this issue, we can introduce a two-way communication system.
## Response channel
We need a way for the server to send a response back to the client.
There are various ways to do this, but the simplest option is to include a `Sender` channel in
the message that the client sends to the server. After processing the message, the server can use
this channel to send a response back to the client.
This is a fairly common pattern in Rust applications built on top of message-passing primitives.

View File

@@ -0,0 +1,8 @@
# A dedicated `Client` type
All the interactions from the client side have been fairly low-level: you have to
manually create a response channel, build the command, send it to the server, and
then call `recv` on the response channel to get the response.
This is a lot of boilerplate code that could be abstracted away, and that's
exactly what we're going to do in this exercise.

View File

@@ -0,0 +1,43 @@
# Bounded vs unbounded channels
So far we've been using unbounded channels.
You can send as many messages as you want, and the channel will grow to accommodate them.
In a multi-producer single-consumer scenario, this can be problematic: if the producers
enqueues messages at a faster rate than the consumer can process them, the channel will
keep growing, potentially consuming all available memory.
Our recommendation is to **never** use an unbounded channel in a production system.
You should always enforce an upper limit on the number of messages that can be enqueued using a
**bounded channel**.
## Bounded channels
A bounded channel has a fixed capacity.
You can create one by calling `sync_channel` with a capacity greater than zero:
```rust
use std::sync::mpsc::sync_channel;
let (sender, receiver) = sync_channel(10);
```
`receiver` has the same type as before, `Receiver<T>`.
`sender`, instead, is an instance of `SyncSender<T>`.
### Sending messages
You have two different methods to send messages through a `SyncSender`:
- `send`: if there is space in the channel, it will enqueue the message and return `Ok(())`.
If the channel is full, it will block and wait until there is space available.
- `try_send`: if there is space in the channel, it will enqueue the message and return `Ok(())`.
If the channel is full, it will return `Err(TrySendError::Full(value))`, where `value` is the message that couldn't be sent.
Depending on your use case, you might want to use one or the other.
### Backpressure
The main advantage of using bounded channels is that they provide a form of **backpressure**.
They force the producers to slow down if the consumer can't keep up.
The backpressure can then propagate through the system, potentially affecting the whole architecture and
preventing end users from overwhelming the system with requests.

View File

@@ -0,0 +1,39 @@
# Update operations
So far we've implemented only insertion and retrieval operations.
Let's see how we can expand the system to provide an update operation.
## Legacy updates
In the non-threaded version of the system, updates were fairly straightforward: `TicketStore` exposed a
`get_mut` method that allowed the caller to obtain a mutable reference to a ticket, and then modify it.
## Multithreaded updates
The same strategy won't work in the current multi-threaded version,
because the mutable reference would have to be sent over a channel. The borrow checker would
stop us, because `&mut Ticket` doesn't satisfy the `'static` lifetime requirement of `SyncSender::send`.
There are a few ways to work around this limitation. We'll explore a few of them in the following exercises.
### Patching
We can't send a `&mut Ticket` over a channel, therefore we can't mutate on the client-side.
Can we mutate on the server-side?
We can, if we tell the server what needs to be changed. In other words, if we send a **patch** to the server:
```rust
struct TicketPatch {
id: TicketId,
title: Option<TicketTitle>,
description: Option<TicketDescription>,
status: Option<TicketStatus>,
}
```
The `id` field is mandatory, since it's required to identify the ticket that needs to be updated.
All other fields are optional:
- If a field is `None`, it means that the field should not be changed.
- If a field is `Some(value)`, it means that the field should be changed to `value`.

View File

@@ -0,0 +1,222 @@
# Locks, `Send` and `Arc`
The patching strategy you just implemented has a major drawback: it's racy.
If two clients send patches for the same ticket roughly at same time, the server will apply them in an arbitrary order.
Whoever enqueues their patch last will overwrite the changes made by the other client.
## Version numbers
We could try to fix this by using a **version number**.
Each ticket gets assigned a version number upon creation, set to `0`.
Whenever a client sends a patch, they must include the current version number of the ticket alongside the
desired changes. The server will only apply the patch if the version number matches the one it has stored.
In the scenario described above, the server would reject the second patch, because the version number would
have been incremented by the first patch and thus wouldn't match the one sent by the second client.
This approach is fairly common in distributed systems (e.g. when client and servers don't share memory),
and it is known as **optimistic concurrency control**.
The idea is that most of the time, conflicts won't happen, so we can optimize for the common case.
You know enough about Rust by now to implement this strategy on your own as a bonus exercise, if you want to.
## Locking
We can also fix the race condition by introducing a **lock**.
Whenever a client wants to update a ticket, they must first acquire a lock on it. While the lock is active,
no other client can modify the ticket.
Rust's standard library provides two different locking primitives: `Mutex<T>` and `RwLock<T>`.
Let's start with `Mutex<T>`. It stands for **mut**ual **ex**clusion, and it's the simplest kind of lock:
it allows only one thread to access the data, no matter if it's for reading or writing.
`Mutex<T>` wraps the data it protects, and it's therefore generic over the type of the data.
You can't access the data directly: the type system forces you to acquire a lock first using either `Mutex::lock` or
`Mutex::try_lock`. The former blocks until the lock is acquired, the latter returns immediately with an error if the lock
can't be acquired.
Both methods return a guard object that dereferences to the data, allowing you to modify it. The lock is released when
the guard is dropped.
```rust
use std::sync::Mutex;
// An integer protected by a mutex lock
let lock = Mutex::new(0);
// Acquire a lock on the mutex
let mut guard = lock.lock().unwrap();
// Modify the data through the guard,
// leveraging its `Deref` implementation
*guard += 1;
// The lock is released when `data` goes out of scope
// This can be done explicitly by dropping the guard
// or happen implicitly when the guard goes out of scope
drop(guard)
```
## Locking granularity
What should our `Mutex` wrap?
The simplest option would be the wrap the entire `TicketStore` in a single `Mutex`.
This would work, but it would severely limit the system's performance: you wouldn't be able to read tickets in parallel,
because every read would have to wait for the lock to be released.
This is known as **coarse-grained locking**.
It would be better to use **fine-grained locking**, where each ticket is protected by its own lock.
This way, clients can keep working with tickets in parallel, as long as they aren't trying to access the same ticket.
```rust
// The new structure, with a lock for each ticket
struct TicketStore {
tickets: BTreeMap<TicketId, Mutex<Ticket>>,
}
```
This approach is more efficient, but it has a downside: `TicketStore` has to become **aware** of the multithreaded
nature of the system; up until now, `TicketStore` has been blissfully ignored the existence of threads.
Let's go for it anyway.
## Who holds the lock?
For the whole scheme to work, the lock must be passed to the client that wants to modify the ticket.
The client can then directly modify the ticket (as if they had a `&mut Ticket`) and release the lock when they're done.
This is a bit tricky.
We can't send a `Mutex<Ticket>` over a channel, because `Mutex` is not `Clone` and
we can't move it out of the `TicketStore`. Could we send the `MutexGuard` instead?
Let's test the idea with a small example:
```rust
use std::thread::spawn;
use std::sync::Mutex;
use std::sync::mpsc::sync_channel;
fn main() {
let lock = Mutex::new(0);
let (sender, receiver) = sync_channel(1);
let guard = lock.lock().unwrap();
spawn(move || {
receiver.recv().unwrap();;
});
// Try to send the guard over the channel
// to another thread
sender.send(guard);
}
```
The compiler is not happy with this code:
```text
error[E0277]: `MutexGuard<'_, i32>` cannot be sent between threads safely
--> src/main.rs:10:7
|
10 | spawn(move || {
| _-----_^
| | |
| | required by a bound introduced by this call
11 | | receiver.recv().unwrap();;
12 | | });
| |_^ `MutexGuard<'_, i32>` cannot be sent between threads safely
|
= help: the trait `Send` is not implemented for `MutexGuard<'_, i32>`, which is required by `{closure@src/main.rs:10:7: 10:14}: Send`
= note: required for `std::sync::mpsc::Receiver<MutexGuard<'_, i32>>` to implement `Send`
note: required because it's used within this closure
```
`MutexGuard<'_, i32>` is not `Send`: what does it mean?
## `Send`
`Send` is a marker trait that indicates that a type can be safely transferred from one thread to another.
`Send` is also an auto-trait, just like `Sized`; it's automatically implemented (or not implemented) for your type
by the compiler, based on its definition.
You can also implement `Send` manually for your types, but it requires `unsafe` since you have to guarantee that the
type is indeed safe to send between threads for reasons that the compiler can't automatically verify.
### Channel requirements
`Sender<T>`, `SyncSender<T>` and `Receiver<T>` are `Send` if and only if `T` is `Send`.
That's because they are used to send values between threads, and if the value itself is not `Send`, it would be
unsafe to send it between threads.
### `MutexGuard`
`MutexGuard` is not `Send` because the underlying operating system primitives that `Mutex` uses to implement
the lock require (on some platforms) that the lock must be released by the same thread that acquired it.
If we were to send a `MutexGuard` to another thread, the lock would be released by a different thread, which would
lead to undefined behavior.
## Our challenges
Summing it up:
- We can't send a `MutexGuard` over a channel. So we can't lock on the server-side and then modify the ticket on the
client-side.
- We can send a `Mutex` over a channel because it's `Send` as long as the data it protects is `Send`, which is the
case for `Ticket`.
At the same time, we can't move the `Mutex` out of the `TicketStore` nor clone it.
How can we solve this conundrum?
We need to look at the problem from a different angle.
To lock a `Mutex`, we don't need an owned value. A shared reference is enough, since `Mutex` uses internal mutability:
```rust
impl<T> Mutex<T> {
// `&self`, not `self`!
pub fn lock(&self) -> LockResult<MutexGuard<'_, T>> {
// Implementation details
}
}
```
It is therefore enough to send a shared reference to the client.
We can't do that directly, though, because the reference would have to be `'static` and that's not the case.
In a way, we need an "owned shared reference". It turns out that Rust has a type that fits the bill: `Arc`.
## `Arc` to the rescue
`Arc` stands for **atomic reference counting**.
`Arc` wraps around a value and keeps track of how many references to the value exist.
When the last reference is dropped, the value is deallocated.
The value wrapped in an `Arc` is immutable: you can only get shared references to it.
```rust
use std::sync::Arc;
let data: Arc<u32> = Arc::new(0);
let data_clone = Arc::clone(&data);
// `Arc<T>` implements `Deref<T>`, so can convert
// a `&Arc<T>` to a `&T` using deref coercion
let data_ref: &u32 = &data;
```
If you're having a déjà vu moment, you're right: `Arc` sounds very similar to `Rc`, the reference-counted pointer we
introduced when talking about interior mutability. The difference is thread-safety: `Rc` is not `Send`, while `Arc` is.
It boils down to the way the reference count is implemented: `Rc` uses a "normal" integer, while `Arc` uses an
**atomic** integer, which can be safely shared and modified across threads.
## `Arc<Mutex<T>>`
If we pair `Arc` with `Mutex`, we finally get a type that:
- Can be sent between threads, because:
- `Arc` is `Send` if `T` is `Send`, and
- `Mutex` is `Send` if `T` is `Send`.
- `T` is `Ticket`, which is `Send`.
- Can be cloned, because `Arc` is `Clone` no matter what `T` is.
Cloning an `Arc` increments the reference count, the data is not copied.
- Can be used to modify the data it wraps, because `Arc` lets you get a shared
reference to `Mutex<T>` which can in turn be used to acquire a lock.
We have all the pieces we need to implement the locking strategy for our ticket store.
## Further reading
- We won't be covering the details of atomic operations in this course, but you can find more information
[in the `std` documentation](https://doc.rust-lang.org/std/sync/atomic/index.html) as well as in the
["Rust atomics and locks" book](https://marabos.nl/atomics/).

View File

@@ -0,0 +1,45 @@
# Readers and writers
Our new `TicketStore` works, but its read performance is not great: there can only be one client at a time
reading a specific ticket, because `Mutex<T>` doesn't distinguish between readers and writers.
We can solve the issue by using a different locking primitive: `RwLock<T>`.
`RwLock<T>` stands for **read-write lock**. It allows **multiple readers** to access the data simultaneously,
but only one writer at a time.
`RwLock<T>` has two methods to acquire a lock: `read` and `write`.
`read` returns a guard that allows you to read the data, while `write` returns a guard that allows you to modify it.
```rust
use std::sync::RwLock;
// An integer protected by a read-write lock
let lock = RwLock::new(0);
// Acquire a read lock on the RwLock
let guard1 = lock.read().unwrap();
// Acquire a **second** read lock
// while the first one is still active
let guard2 = lock.read().unwrap();
```
## Trade-offs
On the surface, `RwLock<T>` seems like a no-brainer: it provides a superset of the functionality of `Mutex<T>`.
Why would you ever use `Mutex<T>` if you can use `RwLock<T>` instead?
There are two key reasons:
- Locking a `RwLock<T>` is more expensive than locking a `Mutex<T>`.
This is because `RwLock<T>` has to keep track of the number of active readers and writers, while `Mutex<T>`
only has to keep track of whether the lock is held or not.
This performance overhead is not an issue if there are more readers than writers, but if the workload
is write-heavy `Mutex<T>` might be a better choice.
- `RwLock<T>` can cause **writer starvation**.
If there are always readers waiting to acquire the lock, writers might never get a chance to run.
`RwLock<T>` doesn't provide any guarantees about the order in which readers and writers are granted access to the lock.
It depends on the policy implemented by the underlying OS, which might not be fair to writers.
In our case, we can expect the workload to be read-heavy (since most clients will be reading tickets, not modifying them),
so `RwLock<T>` is a good choice.

View File

@@ -0,0 +1,54 @@
# Design review
Let's take a moment to review the journey we've been through.
## Lockless with channel serialization
Our first implementation of a multithreaded ticket store used:
- a single long-lived thread (server), to hold the shared state
- multiple clients sending requests to it via channels from their own threads.
No locking of the state was necessary, since the server was the only one modifying the state. That's because
the "inbox" channel naturally **serialized** incoming requests: the server would process them one by one.
We've already discussed the limitations of this approach when it comes to patching behaviour, but we didn't
discuss the performance implications of the original design: the server could only process one request at a time,
including reads.
## Fine-grained locking
We then moved to a more sophisticated design, where each ticket was protected by its own lock and
clients could independently decide if they wanted to read or atomically modify a ticket, acquiring the appropriate lock.
This design allows for better parallelism (i.e. multiple clients can read tickets at the same time), but it is
still fundamentally **serial**: the server processes commands one by one. In particular, it hands out locks to clients
one by one.
Could we remove the channels entirely and allow clients to directly access the `TicketStore`, relying exclusively on
locks to synchronize access?
## Removing channels
We have two problems to solve:
- Sharing `TicketStore` across threads
- Synchronizing access to the store
### Sharing `TicketStore` across threads
We want all threads to refer to the same state, otherwise we don't really have a multithreaded system—we're just
running multiple single-threaded systems in parallel.
We've already encountered this problem when we tried to share a lock across threads: we can use an `Arc`.
### Synchronizing access to the store
There is one interaction that's still lockless thanks to the serialization provided by the channels: inserting
(or removing) a ticket from the store.
If we remove the channels, we need to introduce (another) lock to synchronize access to the `TicketStore` itself.
If we use a `Mutex`, then it makes no sense to use an additional `RwLock` for each ticket: the `Mutex` will
already serialize access to the entire store, so we wouldn't be able to read tickets in parallel anyway.
If we use a `RwLock`, instead, we can read tickets in parallel. We just to pause all reads while inserting
or removing a ticket.
Let's go down this path and see where it leads us.

View File

@@ -0,0 +1,28 @@
# `Sync`
Before we wrap up this chapter, let's talk about another key trait in Rust's standard library: `Sync`.
`Sync` is an auto trait, just like `Send`.
It is automatically implemented by all types that can be safely **shared** between threads.
In order words: `T: Sync` means that `&T` is `Send`.
## `Sync` doesn't imply `Send`
It's important to note that `Sync` doesn't imply `Send`.
For example: `MutexGuard` is not `Send`, but it is `Sync`.
It isn't `Send` because the lock must be released on the same thread that acquired it, therefore we don't
want `MutexGuard` to be dropped on a different thread.
But it is `Sync`, because that has no impact on where the lock is released.
## `Send` doesn't imply `Sync`
The opposite is also true: `Send` doesn't imply `Sync`.
For example: `RefCell<T>` is `Send` (if `T` is `Send`), but it is not `Sync`.
`RefCell<T>` performs runtime borrow checking, but the counters it uses to track borrows are not thread-safe.
Therefore, having multiple threads holding a `&RefCell` would lead to a data race, with potentially
multiple threads obtaining mutable references to the same data. Hence `RefCell` is not `Sync`.
`Send` is fine, instead, because when we send a `RefCell` to another thread we're not
leaving behind any references to the data it contains, hence no risk of concurrent mutable access.