Chapter 07 · making it fast (and not slow)

Fast by default — until you get in the way.

Rust gives you C++-class performance with no GC and no JIT warm-up. Most "Rust is slow" stories are a C# reflex applied where it doesn't belong: copying data to dodge the borrow checker, or blocking the async runtime. Here's where the surprises hide.

7.1What you get for free

Before the gotchas, it's worth knowing what Rust already does well, so you don't spend effort optimizing things that are fast to begin with. There is no garbage collector, so no GC pauses and no tail-latency spikes. Generics are monomorphized (ch. 5) — value types never box. Iterators are lazy and compile down to the same loop you'd write by hand. And there's no JIT: the optimizer has already run by the time the binary exists.

Equivalence

Think of every Rust value as living on the stack or in a tightly-packed buffer by default, the way a C# struct does — except that's true for your types too, not just primitives. Heap allocation is opt-in and visible: it's spelled Box, Vec, String, Rc/Arc. If you don't see one of those, nothing allocated.

7.2Gotcha #1: .clone() to silence the borrow checker

This is the most common self-inflicted slowdown for newcomers. The compiler complains about a move or a borrow, you sprinkle .clone() until it's happy, and now you're deep-copying a Vec or String inside a hot loop. In C# the GC made copying feel free; in Rust a clone is an explicit, often heap, allocation.

Reflex (slow)

fn total(items: Vec<Item>) -> u64 {
    let copy = items.clone(); // whole vec, again
    copy.iter().map(|i| i.qty).sum()
}

Idiomatic (free)

fn total(items: &[Item]) -> u64 {
    // borrow — read without owning
    items.iter().map(|i| i.qty).sum()
}
Recommendation

When the compiler fights you, the answer is almost never .clone() — it's borrow instead. Accept &T or &[T] (a slice ≈ ReadOnlySpan<T>) in function signatures. Reserve .clone() for when you genuinely need a second independent owner, and even then prefer Rc/Arc (a cheap reference-count bump) for large shared data.

7.3Gotcha #2: String vs &str churn

Every String, every format!(), and every .to_string() allocates. C# hides this behind an interned, GC-managed string; Rust splits the concept in two. &str is a borrowed view (no allocation), String is an owned, growable buffer (heap). Building strings in a loop with + or format! reallocates repeatedly.

// allocates a fresh String every iteration — avoid in hot paths
let mut out = String::new();
for w in words { out = format!("{out}{w} "); }

// reuse one buffer — like StringBuilder.Append
let mut out = String::with_capacity(256);
for w in words { out.push_str(w); out.push(' '); }
Capacity is a hint, not a cap

with_capacity(256) just pre-allocates room for 256 bytes so the typical case needs no reallocation — it is not a maximum. Push past it and the String grows on its own: it allocates a larger buffer (capacity grows geometrically, so roughly doubling), copies the existing bytes across, and carries on. That's the same thing a StringBuilder or List<T> does when it outgrows its initial capacity. So sizing it right just spares you those mid-loop reallocations; guessing low costs a reallocation or two, never correctness.

Equivalence

Function parameters should take &str, not String — it's the Rust equivalent of accepting the broadest readable type so callers don't have to allocate to call you. String is your StringBuilder's backing buffer; &str is a read-only slice into existing text.

7.4Gotcha #3: collecting when you could chain

Iterator adapters (.map, .filter, …) are lazy and fuse into a single pass — the LINQ you know, but with zero per-stage allocation. The cost appears the moment you call .collect(), which materializes a new Vec. Collecting into an intermediate vector just to iterate it again is pure waste.

// two heap Vecs for nothing
let a: Vec<_> = xs.iter().filter(|x| x.ok).collect();
let total: u64 = a.iter().map(|x| x.n).sum();

// one fused pass, zero allocation
let total: u64 = xs.iter()
    .filter(|x| x.ok).map(|x| x.n).sum();
Recommendation

Treat .collect() like .ToList() in C#: only at the boundary where you truly need a stored collection. And when you do collect into a Vec or HashMap of known size, pre-size it with Vec::with_capacity(n) (≈ new List<T>(n)) to skip the doubling-reallocation dance.

7.5Gotcha #4: blocking the async runtime

This is the async-specific trap, and it parallels sync-over-async deadlocks in .NET. Tokio runs your futures on a small pool of worker threads (ch. 6). If you call a blocking operation — synchronous file/network IO, a heavy CPU loop, std::thread::sleep — inside an async fn, you pin a whole worker thread and starve every other task it could have run.

async fn handler() {
    // ❌ blocks a runtime worker — stalls the server
    let data = std::fs::read(path).unwrap();

    // ✅ either await the async version…
    let data = tokio::fs::read(path).await?;
    // ✅ …or push CPU-heavy work off the async pool
    let r = tokio::task::spawn_blocking(|| heavy()).await?;
}
Equivalence

spawn_blocking is your Task.Run for offloading sync/CPU work so it doesn't jam the cooperative scheduler. The rule of thumb mirrors .NET: don't do blocking work on the thread that's supposed to be juggling thousands of awaits.

7.6The one that bites everyone: benchmarking a debug build

A debug build (cargo run / cargo build) has no optimization and full overflow checks. It can be 10–100× slower than release. People new to Rust measure a debug build, conclude "Rust isn't faster than C#," and move on. The release profile (the one with opt-level = 3 in this repo's Cargo.toml) is the real one.

Recommendation

Always measure with cargo run --release / cargo build --release (≈ dotnet -c Release). For real numbers use the criterion crate — a statistics-driven benchmark harness, the rough analogue of BenchmarkDotNet — rather than wall-clock println! timing.