StepFun 3.7 Flash - Speed Benchmark in M5 Max

r/LocalLLaMA
Generative AI Open Source AI AI Research

Just ran a benchmark with day-0 shipped llama.cpp's branch. M5 Max: 128 GB - Q4_K_S / memory peak around ~120+ GB making things sluggish but still usable once cmd+tab landed. Short context < 16k feels fast and very responsive. 32k-64k's speed is not bad, usable.