← Back to community benchmarks

Meta-Llama-3.1-8B-Instruct

M4 (8c) · 16 GB · 4bit · 2026-03-10

Performance

4k

tokens

160.1

PP tok/s

19.9

TG tok/s

25585

TTFT (ms)

5.3

Peak mem (GB)

Hardware

Chip M4 (8c)

Memory 16 GB

GPU Cores 8

Software

oMLX v0.2.7

macOS macOS 26.3

Context 4,096

Performance by Context Length

Context	PP tok/s	TG tok/s	Peak Mem
1k	168.3	21.7	4.9 GB	view
4k	160.1	19.9	5.3 GB	current

Batching Results

Batch Size	TG tok/s	Speedup
1×	21.7	1.00×
2×	38.3	1.76×
4×	37.5	1.73×