← Back to community benchmarks

Qwen2.5-Coder-3B-Instruct

M5 Max (32c) · 36 GB · 4bit · 2026-05-18

Performance

1k

tokens

3,887

PP tok/s

172.8

TG tok/s

264

TTFT (ms)

2.5

Peak mem (GB)

Hardware

Chip M5 Max (32c)

Memory 36 GB

GPU Cores 32

Software

oMLX v0.3.9.dev2

macOS macOS 26.4.1

Context 1,024

Batching Results

Batch Size	TG tok/s	Speedup
1×	172.8	1.00×
2×	266.5	1.54×
4×	350.8	2.03×