LIVE-Last scan updating-53 sources active-229 signals today-AI CODINGAgent Workspace Linux: Isolated Desktop for AI Agents

mightbesaad/llm-reliability-evals

RDR67Python0 stars+0 stars / 7dOpen source

Why this repo matters

Latest release 0d ago, 6 developer signals, 2 package/install signals

Reproducible evals for LLM reliability failures in agentic and knowledge work — 8-mode taxonomy, deterministic graders, and a trajectory harness with scripted tools. Orthogonal to capability and safety evals. (0 stars, 0 forks, Python, fresh release, 5 AI signals, 3 developer signals). Latest release: v0.2.0.