BeeLlama v0.2.0 – major DFlash update. Single RTX 3090: Qwen 3.6 27B up to 164 tps (4.40x), Gemma 4 31B up to 177.8 tps (4.93x). Prompt processing speed near baseline.
r/LocalLLaMA
•
AI Hardware
Open Source AI
AI Research
BeeLlama v0.2.0 is here! Not quite a pegasus, but close enough. GitHub | Qwen 3.6 27B Quick Start | Gemma 4 31B Quick Start Full Gemma 4 31B with efficient DFlash implementation and vision. Major Qwen 3.6 27B performance update from lower DFlash overhead, cleaner prefill handling, drafter K/V projection caching, and safer CUDA execution. DFlash GGUFs with upstream architecture are now ed. Fixes to adaptive profit behavior around baseline probing. Reduced verifier path is stricter now, with safer fallback to full logits when grammar, sampler state, or reasoning requires it.