Benchmarked inference engines for M1 Max 64gb-results & analysis
r/LocalLLaMA
•
Generative AI
I'm a hobbyist on a budget, and am using a M1 Max MacBook Pro for local inference, with Hermes Agent. I've endlessly researched which inference engines to use, and there's probably no right answer. This caught my attention today: I ran the de's mlx-chronos (github.com/igurss/mlx-chronos) across rapid-mlx, omlx, mlx-lm, and ollama using Qwen3.5-4B on an M1 Max 64GB. Results submitted to the mlx-chronos community leaderboard. Full write-up with charts:. Credit to Claude Code for the webpage and analysis. Short version: rapid-mlx leads on speed and memory efficiency.