← Back to community benchmarks

Meta-Llama-3.1-8B-Instruct

M4 (8c) · 16 GB · 4bit · 2026-03-10

Performance

1k

tokens

168.3

PP tok/s

21.7

TG tok/s

6091

TTFT (ms)

4.9

Peak mem (GB)

Hardware

Chip M4 (8c)

Memory 16 GB

GPU Cores 8

Software

oMLX v0.2.7

macOS macOS 26.3

Context 1,024

Performance by Context Length

Context	PP tok/s	TG tok/s	Peak Mem
1k	168.3	21.7	4.9 GB	current
4k	160.1	19.9	5.3 GB	view

Batching Results

Batch Size	TG tok/s	Speedup
1×	21.7	1.00×
2×	38.3	1.76×
4×	37.5	1.73×