-
FlashAttention 3 - A Worklog[WIP]
Implementing FlashAttention 3 from scratch
-
CuTe DSL - Notes
My notes on CuTe DSL and FA3 Dissection
-
CUTLASS WGMMA on Hopper - Notes
My notes on WGMMA internals from the Colfax Research CUTLASS Hopper GEMM blog
-
Investigating Flaky `test_eagle_dp` — Batch Invariance Failure on L4 GPUs
Investigative notes and fixes for test_eagle_dp CI tests in vLLM
-
GEMM Kernel Optimization Notes
My notes from Simon Boehm's CUDA GEMM optimization blog