Entropy always increases: July 2022

I've been playing with Rust again recently (see my previous post on the topic) and have some more thoughts. It's a mix of wonder and horror.

First, the good stuff. As a reminder, Rust restricts the references you can make: at any time, you can either have a single mutable reference to a variable, OR an unlimited number of shared (immutable) references. When I first saw that my immediate thought was that it is great for concurrency, because that's exactly what you need to avoid data races, but it seemed like just an inconvenience for single-threaded code. However, I've since realised that there are several advantages:

Iterators. A common problem in several languages is "iterator invalidation", which occurs when the container being iterated is modified while you're iterating. In other languages, at best you're getting an exception, and possibly you're getting undefined behaviour. In Rust, the iterator holds a shared reference to the container, making it impossible to mutate it for the lifetime of the iterator. This does have downsides though: it's impossible to keep a handle to an element in a collection while still allowing new elements to be added, even if the collection doesn't require reallocation to do so (e.g. a linked list).
Aliasing and side effects. A function that takes a mutable output parameter is guaranteed that it won't alias any of the inputs, and can optimise accordingly. And a function A that calls a function B can be sure that B won't modify anything to which A holds a reference.

The bad stuff is that Rust has too much magic and it's not all specified in the documentation. The particular case I ran across is in my Stack Overflow question. In short, writing a function with a template parameter is not the same as writing it with the concrete type, because the compiler has special treatment for arguments that are lexically declared &mut. The reference manual doesn't mention this, and it says basically nothing about how template instantiation is done. What's more, the special behaviour that's triggered is itself undocumented, and pretty arcane (references are in some manner "reborrowed"). One of the side effects is that the identity function isn't actually a no-op.

The borrow checker also relies on trying to infer what the relationship between a functions inputs and outputs is, based on lifetime specifications, rather than letting you be explicit. Consider this code:

fn inc_ret<'a>(x: &'a mut i32) -> &'a i32 {
    *x += 1;
    &*x
}

fn main() {
    let mut x = 1;
    let y = inc_ret(&mut x);
    println!("{} {}", x, *y);
}

The inc_ret function takes a mutable reference to an integer, increments it, and return an immutable reference. This code should be perfectly safe, because y is an immutable reference to x. However, the borrow checker simply relates the output to the input via the common lifetime ('a) and can't tell that the mutability doesn't pass through to the output, so it refuses to compile this code.

Another piece of magic I'm not totally happy with is the dot operator. Unlike in C, there is no -> operator; instead the compiler will automatically dereference references for you. That might be okay if it wasn't that reference types can themselves have methods (via traits), and it can be ambiguous whether you want the method on the reference or on what the reference points to. The Rustonomicon has a horrifying example of this:

fn do_stuff<T: Clone>(value: &T) {
    let cloned = value.clone();
}

That's a function taking a reference to a cloneable value and cloning the value. But, if you forget to specify the : Clone, it will still compile, but clone the reference instead (even if T is cloneable). So restricting the types accepted by your function has actually changed the semantics!

Entropy always increases

Saturday, July 23, 2022

More thoughts on Rust