Gemma 4 · Quantization-Aware Training

QAT Visualizer

Watch precision drop from bf16 to int4 — the model shrinks, the lights stay on. Same quality, half the disk, fully local on your machine.

Checking Gemma 4…

Weight grid · gemma4:e2b-it-qat params: ~2B

precision compression

bf16

16-bit weights

bf16

int8

int6

int4 · QAT

Loop

bf16 — fat, glowing, expensive

int4 — small, crisp, on-device

The payoff M2 MacBook Air

E2B

~2B

E4B

~4B

12B

26B

a4b

31B

Disk · QAT int4

4.3GB

▼ 40% smaller

Original Q4

7.2GB

on disk before QAT

0 GB8 GB

4.3 GB

7.2 GB before

Inference speed

12–31% faster

Quality kept

matched / exceeded Q4

Quality matched & often exceeded QAT int4 holds the line against the non-QAT Q4 baseline.

Everyday queries that run great locally 0%

88.7% of day-to-day prompts never need the cloud.