Why this repo matters
No release captured, 10 developer signals, 6 package/install signals

Benchmark LLM accuracy, latency, cost, and hallucination rates across models with this open-source evaluation suite. (0 stars, 0 forks, Python, 8 AI signals, 6 developer signals).