I'm Sabari. I work mostly on making AI models run faster on hardware - LLM inference, profiling, quantization, that kind of thing. I like Python and Go, and I tend to have a side project going at any given time. This is just a place to keep track of what I've been doing.
Experience
Multicoreware IncJun 2025 - Present
Software Engineer
Got a Vision-Language-Action model running 5.7× faster on AI accelerators (4000ms → 700ms) with profiling and quantization. Wrote a Go installer with rollback handling for the Linux kernel and AMD driver setup on NPU/iGPU. Currently building an agentic AI system that reasons and calls tools over simulation environments.
Multicoreware IncDec 2024 - Jun 2025
Junior Software Engineer (Intern)
Implemented Heavy Hitter Oracle in vLLM - a sparse KV-cache pruning method that gave 20-30% more throughput at the same sparsity levels.
FinequsJan 2024 - Mar 2024
Software Developer Intern
Built a Selenium automation that cut data entry from hours to minutes, and integrated the Tata Telecom API into the product dashboard.
Projects
Heavy Hitter Oracle in vLLM
A sparse KV-cache mechanism for vLLM. It keeps the high-attention keys during decoding and drops the rest, which works out to 20-30% more throughput at the same sparsity.
vLLMLLM InferenceSparse Attention
Decentralized EHR System
A blockchain-based health records system built at the PLI Hackathon (Sathyabama Institute). Won 50k XDC tokens.
BlockchainXDCHackathon
Decentralized VPN - Hackverse Finalist
A VPN protocol where regular user devices act as relays and earn incentives for doing so. Reached the finals at Hackverse 2024.