Why this repo matters
Latest release 0d ago, 8 developer signals, 2 package/install signals
A verifiable-reward agentic benchmark: does an LLM correctly allocate verification when orchestrating a fallible biology foundation model? (0 stars, 0 forks, Python, fresh release, 7 AI signals, 5 developer signals). Latest release: v0.1.0.