A concrete breakdown of GPU memory during diffusion model inference, using FLUX.2 klein 4B as a worked example.
Hi, I'm Prateek Shrivastava, a R&D Software Engineer.
"Before someone builds a super-bad robot, someone has to build a mildly bad robot, and before that a not-so-bad robot."
— Rodney A. Brooks
The road to creating AI is the most fascinating journey of our generation. Here I write about diffusion models, GPU systems, and ML infrastructure. More about me →
2026
When you can't afford 50 transformer passes per image. A practical guide to choosing between step distillation (6-12x) and guidance distillation (2x), no special dataset needed.
12 interactive animations explaining caching, parallelism, and quantization techniques for diffusion models
2018
The four core Python metaphors (decorators, generators, context managers, metaclasses) explained from the ground up with examples. Based on the talk by James Powell.
Pandas distinct values by column
2017
How to quickly deploy a mongo container and avoid paying ransom
using custom aggregate fucntions with pandas apply
2016
Building a match outcome predictor using hero picks and match statistics with logistic regression
How Docker solved dependency hell for building TTS and OCR models with Android NDK and TensorFlow at Indus OS
Porting SPH fluid simulation from an IIT Madras thesis to Python and Kivy, and the performance tricks needed to make it interactive
Using Scala and Spark for user analytics at Indus OS, processing millions of events from budget Android phones on 2G networks
Replacing HMM-based TTS with LSTM on Android NDK in C for Hindi, Tamil, Telugu, Malayalam, Bengali, and Marathi
CNN-based character recognition for Hindi and Devanagari on devices with no GPU and under 512MB RAM
How we froze TF graphs, quantized weights, and shipped CNN/LSTM models on budget Android phones at Indus OS
log into B as b from A as a without password
2015
Here is How to Download/Use This Theme