Research

Publications, preprints, and ongoing projects at the intersection of machine learning, systems, and science.

Status:

Year:

6 results

NeurIPSAccepted2025

Efficient Sparse Mixture-of-Experts for Sub-Quadratic Inference on Long Contexts

N. Meters, A. Chen, R. Kapoor

We present a sparse mixture-of-experts architecture that achieves sub-quadratic inference cost on sequences exceeding 128k tokens. By introducing a locality-sensitive routing mechanism that exploits the low-rank structure of attention patterns, our method reduces peak memory by 3.8x while maintaining 98.2% of dense model quality across standard long-context benchmarks. We provide theoretical guarantees on routing stability and demonstrate wall-clock speedups on commodity hardware.

transformersmixture-of-expertsefficient-inferencelong-context

ICLRIn Review2025

Differentiable Navier-Stokes Solvers for Turbulence-Aware Neural Surrogate Models

N. Meters, J. Lindqvist

We develop a fully differentiable spectral Navier-Stokes solver that enables end-to-end training of neural surrogate models for turbulent flows. Our approach embeds physical conservation laws directly into the computational graph, allowing gradient-based optimization to respect divergence-free constraints without projection steps. On the Kolmogorov flow benchmark, the resulting surrogates achieve 12x speedup over classical solvers at Reynolds numbers up to 10,000 with bounded error accumulation over 500 rollout steps.

differentiable-physicsturbulenceneural-surrogatesfluid-dynamics

OSDIPublished2024

Zero-Copy Distributed KV-Cache for Disaggregated LLM Serving

N. Meters, P. Okonkwo, M. Tanaka

We propose a zero-copy distributed key-value cache architecture for serving large language models across disaggregated GPU clusters. By leveraging RDMA-based memory transfers and a novel page-table abstraction for attention state, our system eliminates serialization overhead during prefill-decode handoffs. Evaluations on a 64-GPU cluster show 2.1x improvement in time-to-first-token and 40% higher throughput compared to state-of-the-art serving frameworks under production trace workloads.

systemsllm-servingdistributed-systemsgpu

ICMLPublished2024

Persistent Homology Features for Robust 3D Point Cloud Classification

N. Meters, S. Hoffmann

We introduce a pipeline that extracts persistent homology descriptors from 3D point clouds and integrates them as auxiliary features into standard classification architectures. The topological features capture global shape properties invariant to noise, occlusion, and sampling density. On ModelNet40 and ScanObjectNN, augmenting PointNet++ with our descriptors improves robustness to 60% point dropout by 8.3 percentage points with negligible computational overhead.

topological-data-analysispoint-clouds3d-visionpersistent-homology

arXiv preprintPreprint2024

Adaptive Mixed-Precision Training via Gradient Noise Estimation

N. Meters, L. Vasquez, A. Chen

We present an adaptive algorithm that dynamically selects per-layer numerical precision during training by estimating the signal-to-noise ratio of gradient updates. Layers with high gradient noise tolerance are cast to FP8, while sensitive layers retain BF16, with transitions governed by an exponential moving average of gradient variance. Applied to GPT-scale models, the method reduces training FLOPS by 28% with no measurable degradation in validation loss.

mixed-precisiontraining-efficiencynumerical-methods

Physical Review XPublished2023

Graph Neural Network Decoders for Surface Code Quantum Error Correction

N. Meters, D. Petrov, K. Yamamoto

We design a message-passing graph neural network decoder for the rotated surface code that operates in O(n) time per syndrome measurement round. The decoder is trained on synthetic noise models and generalizes to hardware-calibrated noise without fine-tuning. At code distances 5 through 21, it achieves logical error rates within a factor of 1.3 of minimum-weight perfect matching while running 50x faster, enabling real-time decoding at the repetition rates required by near-term superconducting quantum processors.

quantum-computingerror-correctiongraph-neural-networkssurface-codes