Blog

Notes on machine learning, systems, mathematics, and the occasional tangent.

November 15, 202512 min read
Understanding Transformer Attention: From Scratch to Flash Attention
A deep dive into attention mechanisms — from the original scaled dot-product formulation to modern Flash Attention and its hardware-aware algorithmic design.
machine-learningtransformerssystems
September 20, 202515 min read
Rust's Memory Model: What Systems Programmers Actually Need to Know
Beyond the borrow checker — understanding Rust's memory model, unsafe abstractions, and how they enable zero-cost concurrency.
rustsystemsprogramming-languages
July 10, 202510 min read
Bayesian Optimization for Hyperparameter Tuning: A Practical Guide
How Gaussian processes and acquisition functions can replace grid search — with implementation notes for real-world ML pipelines.
machine-learningoptimizationstatistics
April 22, 20258 min read
Time Is an Illusion: Logical Clocks in Distributed Systems
Why you can't trust wall clocks in distributed systems, and how Lamport timestamps, vector clocks, and hybrid logical clocks solve ordering.
distributed-systemssystemstheory
February 1, 202514 min read
Information Geometry: The Shape of Probability
How differential geometry provides a natural framework for understanding statistical models, with connections to natural gradient descent.
mathematicsmachine-learningstatistics