Bringing Scientific Rigor to LLM Comparison
Dev.to AI
•
Generative AI
AI Safety
CLI tool for comparing LLMs with bootstrap CIs, McNemar's test, hallucination detection, and cost tracking. 8 providers, one pip install.