..

Approaching problems

When solving a problem, I tend to decompose it along as few axes as it will take to solve the problem, in a way that feels right to me.

For example, if I need to buy a new sofa covering because my dog’s nails and dirty body rolling around destroyed the last one, I am likely to view the problem along the axes of:

  • Fabric durability
  • Aesthetic cohesion w/ the living room

However, someone else (who might be my spouse) might decompose the problem by additionally considering:

  • Dog grooming and maintenance to elongate the life of any covering
  • Cost of the covering

The approach you take to solve the problem then depends on the axes to which you’re attending.

Anchoring

Obviously, I’m not usually quite to explicit in enumerating my approaches to solving problems. I typically just start trudging toward an answer. I’ve been trying to be better about this when solving algorithmic problems, though.

A signal that I’ve begun tending to in attempt to elucidate my own unconscious biases, are the “anchoring variables” I choose to solve the problem. By this I mean the kind of variables I initialize at the top of a function that end up being or information the result, e.g.

fn sum(nums: &[usize]) -> usize {
    // We've anchored the solution around finding the sum.
    let mut sum = 0;
    ...
}

Few people are likely to choose anything other than sum to anchor the solution to this problem, so let’s look at something more ambiguous.

/// How I might solve the sofa covering problem
fn find_sofa_covering(wayfair_search_results: &[Coverings]) -> Covering {
    let mut fabric = FabricDurabilityHeap::default();
    let mut aesthetics = AestheticCohesionHeap::default();
    ...
}
/// How my spouse might think about this problem
fn find_sofa_covering_spouse(covering: &[Coverings], groomers: &[Groomers]) -> Plan {
    let mut cheapest_cover = coverings[0];
    let mut groomers = GroomerConvenienceHeap::default();
    ...
}

Because my approach is not in any way anchored around price, I am unlikely to see a solution that involves the cheapest cover. Further, because I am only looking at sofa coverings, I am not going to come up with an answer that involves anything more complex than just buying a new covering.

The point here is not to determine which of these approaches is better, but instead to elucidate that we often make choices about which axes to anchor our solutions around, and those initial choices cascade into the how we even begin solving the problem.

Praxis

Imagine you are given this question.

tl;dr Write an implementation of the File trait for BufferedFile. The one caveat is that all writes must pass through the buffer.

// A simple, infallible write method
trait File {
    fn write(&mut self, data: &[u8]);
}

// A ZST for a trivial [`File`] impl.
struct PrintLn;

impl File for PrintLn {
    fn write(&mut self, data: &[u8]) {
        println!("{:?}", data);
    }
}

// An object that buffers writes before sending them to its embedded file
struct BufferedFile<F: File> {
    buffer: Vec<u8>,
    file: F,
    ptr: usize,
}

impl<F: File> BufferedFile<F> {
    fn new(file: F, capacity: usize) -> Self {
        // Initialize BufferedFile with given file and buffer capacity
        Self {
            file,
            buffer: vec![0u8; capacity],
            ptr: 0usize,
        }
    }

    fn flush(&mut self) {
        if self.ptr > 0 {
            // Write all buffered data to the underlying file
            self.file.write(&self.buffer[..self.ptr]);
            self.ptr = 0;
        }
    }
}

impl<F: File> File for BufferedFile<F> {
    fn write(&mut self, data: &[u8]) {
        todo!("implement this")
    }
}

The importance of perspective

Imagine solving this by anchoring your solution around the buffered file’s state. You might:

  • Write as many byte as you can into the buffer’s remaining capacity because you don’t know what the last write to the buffer was.
  • Enter a loop to write the remaining bytes from the file, writing as many bytes as you can.
  • Ensure that if you exit the loop with a totally full buffer, you should write it.

Another way to think of it is like:

buffer: πŸŸ₯πŸŸ₯πŸŸ₯⬜⬜⬜⬜⬜
file:   🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩
  1. First fill up the buffer with as many bytes as you can:
    buffer: πŸŸ₯πŸŸ₯πŸŸ₯🟩🟩🟩🟩🟩
    file:   🟩🟩🟩🟩🟩|🟩🟩🟩🟩🟩🟩🟩🟩
    
  2. Now write chunks as wide as possible, knowing that whenever you enter this loop you need to flush:
    buffer: ⬜⬜⬜⬜⬜⬜⬜⬜
    file:   🟩🟩🟩🟩🟩|🟩🟩🟩🟩🟩🟩🟩🟩
    
    buffer: 🟩🟩🟩🟩🟩🟩🟩🟩
    file:   🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩|
    
  3. Then, if you exit the loop with a full buffer you also need to flush:
    buffer: ⬜⬜⬜⬜⬜⬜⬜⬜
    file:   🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩|
    

This is little complex, but represents the state changes that you would expect:

impl<F: File + Debug> File for BufferedFile<F> {
    fn write(&mut self, data: &[u8]) {
        // Anchor the solution by determining the space reamining in the buffer
        let remaining_bytes = self.buffer.len() - self.ptr;

        // If there are fewer bytes than you have space, make sure you account
        // for that.
        let initial_data = remaining_bytes.min(data.len());

        // Do the initial write.
        self.buffer[self.ptr..self.ptr + initial_data].copy_from_slice(&data[..initial_data]);
        self.ptr += initial_data;

        let mut total_bytes = initial_data;
        while total_bytes < data.len() {
            // We know that at the beginning of each loop we need to flush.
            self.flush();

            // Determine how many bytes to write in this iteration; might be
            // fewer than the total length of the buffer.
            let bytes_to_write = self.buffer.len().min(data.len() - total_bytes);
            self.buffer[self.ptr..self.ptr + bytes_to_write]
                .copy_from_slice(&data[total_bytes..total_bytes + bytes_to_write]);

            // Adjust counts
            self.ptr += bytes_to_write;
            total_bytes += bytes_to_write;
        }

        // Eagerly flush
        if self.ptr == self.buffer.len() {
            self.flush();
        }
    }
}

The crux of this implementation and thinking is kind of sympathy with the BufferedFile, or anchored around the state of its buffer. If I had to do this work myself––e.g. putting boxes on shelves as they came in–I’d probably think it this way.

However, it’s also possible to anchor the solution by ensuring that all of the bytes in data are handled.

fn write(&mut self, data: &[u8]) {
    // Anchor the solution around the number of bytes we've handled from `data`.
    let mut bytes_handled = 0;
    // We need to write all of the bytes.
    while bytes_handled < data.len() {
        // We can write as many bytes as the buffer fits, or that the data
        // has left unprocessed, whichever is smaller.
        let bytes_to_write =
            std::cmp::min(self.buffer.len() - self.ptr, data.len() - bytes_handled);
        // Write them.
        self.buffer[self.ptr..self.ptr + bytes_to_write]
            .copy_from_slice(&data[bytes_handled..bytes_handled + bytes_to_write]);
        self.ptr += bytes_to_write;
        bytes_handled += bytes_to_write;
        // Buffer is full
        if self.ptr == self.buffer.len() {
            self.flush();
        }
    }
}

The difference between the code in these two approaches isn’t so dramatic; we’ve really just optimized away the initial write, and made the flush less special-cased.

Choosing anchors

I’ve been thinking about why some anchors work better than others. In the case above, the notable difference is anchoring the problem around state vs. process.

Algorithms are meant to be transformations of input to output, which is a more inherently process-oriented approach, so maybe focusing on the process is more likely to yield better outcomes.

However, it’s also possible that the process-oriented anchor is better because it forces us to focus on the problem we actually want to solve. The write method isn’t asking you to manage the buffer––it’s asking you to write the data to a file. It seems natural then that focusing on the task at hand involves less cruft than trying to focus on an element that exists mainly as support.

Are there better heuristics for choosing anchors? This one seems like a step in the right direction.