Blog
Notes on machine learning, systems, mathematics, and the occasional tangent.
- 12 min readUnderstanding Transformer Attention: From Scratch to Flash Attention
A deep dive into attention mechanisms — from the original scaled dot-product formulation to modern Flash Attention and its hardware-aware algorithmic design.
machine-learningtransformerssystems - 15 min readRust's Memory Model: What Systems Programmers Actually Need to Know
Beyond the borrow checker — understanding Rust's memory model, unsafe abstractions, and how they enable zero-cost concurrency.
rustsystemsprogramming-languages - 10 min readBayesian Optimization for Hyperparameter Tuning: A Practical Guide
How Gaussian processes and acquisition functions can replace grid search — with implementation notes for real-world ML pipelines.
machine-learningoptimizationstatistics - 8 min readTime Is an Illusion: Logical Clocks in Distributed Systems
Why you can't trust wall clocks in distributed systems, and how Lamport timestamps, vector clocks, and hybrid logical clocks solve ordering.
distributed-systemssystemstheory - 14 min readInformation Geometry: The Shape of Probability
How differential geometry provides a natural framework for understanding statistical models, with connections to natural gradient descent.
mathematicsmachine-learningstatistics