← Back to community benchmarks

Llama-3.2-1B-Instruct

M1 (8c) · 8 GB · 4bit · 2026-03-09

Performance

4k

tokens

800.9

PP tok/s

45.2

TG tok/s

5116

TTFT (ms)

1.4

Peak mem (GB)

Hardware

Chip M1 (8c)

Memory 8 GB

GPU Cores 8

Software

oMLX v0.2.6

macOS macOS 26.3.1

Context 4,096

Performance by Context Length

Context	PP tok/s	TG tok/s	Peak Mem
1k	823.0	65.9	1.3 GB	view
4k	800.9	45.2	1.4 GB	current

Batching Results

Batch Size	TG tok/s	Speedup
1×	65.9	1.00×
2×	110.4	1.68×
4×	117.9	1.79×