Chapter 07 · making it fast (and not slow)
Rust gives you C++-class performance with no GC and no JIT warm-up. Most "Rust is slow" stories are a C# reflex applied where it doesn't belong: copying data to dodge the borrow checker, or blocking the async runtime. Here's where the surprises hide.
Before the gotchas, it's worth knowing what Rust already does well, so you don't spend effort optimizing things that are fast to begin with. There is no garbage collector, so no GC pauses and no tail-latency spikes. Generics are monomorphized (ch. 5) — value types never box. Iterators are lazy and compile down to the same loop you'd write by hand. And there's no JIT: the optimizer has already run by the time the binary exists.
Think of every Rust value as living on the stack or in a tightly-packed buffer by default, the way a C# struct does — except that's true for your types too, not just primitives. Heap allocation is opt-in and visible: it's spelled Box, Vec, String, Rc/Arc. If you don't see one of those, nothing allocated.
.clone() to silence the borrow checkerThis is the most common self-inflicted slowdown for newcomers. The compiler complains about a move or a borrow, you sprinkle .clone() until it's happy, and now you're deep-copying a Vec or String inside a hot loop. In C# the GC made copying feel free; in Rust a clone is an explicit, often heap, allocation.
fn total(items: Vec<Item>) -> u64 {
let copy = items.clone(); // whole vec, again
copy.iter().map(|i| i.qty).sum()
}
fn total(items: &[Item]) -> u64 {
// borrow — read without owning
items.iter().map(|i| i.qty).sum()
}
When the compiler fights you, the answer is almost never .clone() — it's borrow instead. Accept &T or &[T] (a slice ≈ ReadOnlySpan<T>) in function signatures. Reserve .clone() for when you genuinely need a second independent owner, and even then prefer Rc/Arc (a cheap reference-count bump) for large shared data.
String vs &str churnEvery String, every format!(), and every .to_string() allocates. C# hides this behind an interned, GC-managed string; Rust splits the concept in two. &str is a borrowed view (no allocation), String is an owned, growable buffer (heap). Building strings in a loop with + or format! reallocates repeatedly.
// allocates a fresh String every iteration — avoid in hot paths
let mut out = String::new();
for w in words { out = format!("{out}{w} "); }
// reuse one buffer — like StringBuilder.Append
let mut out = String::with_capacity(256);
for w in words { out.push_str(w); out.push(' '); }
with_capacity(256) just pre-allocates room for 256 bytes so the typical case needs no reallocation — it is not a maximum. Push past it and the String grows on its own: it allocates a larger buffer (capacity grows geometrically, so roughly doubling), copies the existing bytes across, and carries on. That's the same thing a StringBuilder or List<T> does when it outgrows its initial capacity. So sizing it right just spares you those mid-loop reallocations; guessing low costs a reallocation or two, never correctness.
Function parameters should take &str, not String — it's the Rust equivalent of accepting the broadest readable type so callers don't have to allocate to call you. String is your StringBuilder's backing buffer; &str is a read-only slice into existing text.
Iterator adapters (.map, .filter, …) are lazy and fuse into a single pass — the LINQ you know, but with zero per-stage allocation. The cost appears the moment you call .collect(), which materializes a new Vec. Collecting into an intermediate vector just to iterate it again is pure waste.
// two heap Vecs for nothing
let a: Vec<_> = xs.iter().filter(|x| x.ok).collect();
let total: u64 = a.iter().map(|x| x.n).sum();
// one fused pass, zero allocation
let total: u64 = xs.iter()
.filter(|x| x.ok).map(|x| x.n).sum();
Treat .collect() like .ToList() in C#: only at the boundary where you truly need a stored collection. And when you do collect into a Vec or HashMap of known size, pre-size it with Vec::with_capacity(n) (≈ new List<T>(n)) to skip the doubling-reallocation dance.
This is the async-specific trap, and it parallels sync-over-async deadlocks in .NET. Tokio runs your futures on a small pool of worker threads (ch. 6). If you call a blocking operation — synchronous file/network IO, a heavy CPU loop, std::thread::sleep — inside an async fn, you pin a whole worker thread and starve every other task it could have run.
async fn handler() {
// ❌ blocks a runtime worker — stalls the server
let data = std::fs::read(path).unwrap();
// ✅ either await the async version…
let data = tokio::fs::read(path).await?;
// ✅ …or push CPU-heavy work off the async pool
let r = tokio::task::spawn_blocking(|| heavy()).await?;
}
spawn_blocking is your Task.Run for offloading sync/CPU work so it doesn't jam the cooperative scheduler. The rule of thumb mirrors .NET: don't do blocking work on the thread that's supposed to be juggling thousands of awaits.
A debug build (cargo run / cargo build) has no optimization and full overflow checks. It can be 10–100× slower than release. People new to Rust measure a debug build, conclude "Rust isn't faster than C#," and move on. The release profile (the one with opt-level = 3 in this repo's Cargo.toml) is the real one.
Always measure with cargo run --release / cargo build --release (≈ dotnet -c Release). For real numbers use the criterion crate — a statistics-driven benchmark harness, the rough analogue of BenchmarkDotNet — rather than wall-clock println! timing.