Why this repo matters
No release captured, 9 developer signals, 1 package/install signals
LLM evaluation harness and classical NLP baseline — agent quality scoring, failure classification, and automated correction generation (0 stars, 0 forks, Python, 7 AI signals, 5 developer signals).