Bringing Scientific Rigor to LLM Comparison

Dev.to AI
Generative AI AI Safety

CLI tool for comparing LLMs with bootstrap CIs, McNemar's test, hallucination detection, and cost tracking. 8 providers, one pip install.