Back to trendsmightbesaad/llm-reliability-evals: Reproducible evals for LLM reliability failures in agentic and knowledge work — 8-mode taxonomy, deterministic graders, and a trajectory harness with scripted tools. Orthogonal to capability and safety evals.
Source-linked topic cluster with 1 signals across related articles, projects, models, papers, and source updates.
RDR54Developer ToolsMomentum 74Last seen Jul 2, 2026
Source mixGITHUB:github-ai-on-radar (1)
Signals