StepFun 3.7 Flash - Speed Benchmark in M5 Max
r/LocalLLaMA
•
Generative AI
Open Source AI
AI Research
Just ran a benchmark with day-0 shipped llama.cpp's branch. M5 Max: 128 GB - Q4_K_S / memory peak around ~120+ GB making things sluggish but still usable once cmd+tab landed. Short context < 16k feels fast and very responsive. 32k-64k's speed is not bad, usable.