LIVE-Last scan updating-52 sources active-186 signals today-DEVELOPER MaIN.NET NuGet Package Integrates LLMs, RAG, and Agents into .NET
Builder topic

AI benchmark radar

Benchmark and evaluation updates that keep marketing claims separate from independent results.

Items
9
published only
Avg RDR
86
current set
Sources
2
linked domains
Updated
May 27, 2026
indexable
Open Source AIGitHub trend signalMay 27, 2026

LangChain: The Agent Engineering Platform for LLM Applications

LangChain is an open-source Python framework designed for building and deploying LLM-powered applications and agents. It provides tools for chaining interoperable components, integrating with various data sources and models, and supporting rapid prototyping and production-ready features like monitoring and debugging.

RDR88
AI CodingGitHub trend signalMay 24, 2026

Origin: A Local-First Rust Daemon for AI Agent Memory and Context Management

Origin is a local-first Rust daemon designed to manage AI agent memory and context. It features Git-versioned memories, distilled wiki pages, and supports sessions for various AI clients like Claude Code, Cursor, and Codex, aiming to provide persistent context across AI workflows.

RDR88
AgentsGitHub trend signalMay 24, 2026

CUA: Open-Source Infrastructure for Desktop-Controlling AI Agents

CUA is an open-source project providing infrastructure for developing, training, and evaluating AI agents capable of controlling full desktop environments across macOS, Linux, and Windows. It includes sandboxes, SDKs, and benchmarks to facilitate the creation of computer-use agents.

RDR88
AgentsGitHub trend signalMay 26, 2026

Google's ADK-Python: An Open-Source Toolkit for AI Agent Development

Google has released ADK-Python, an open-source, code-first Python toolkit designed for building, evaluating, and deploying AI agents. The toolkit, currently at version 2.1.0, emphasizes flexibility and control in agent development and includes a graph-based execution engine for workflows and a structured Task API for agent-to-agent delegation.

RDR87
AI ToolsGitHub trend signalMay 26, 2026

Hermes Katana: A Defense-in-Depth Security Toolkit for LLM Agents

Hermes Katana is a Python-based security toolkit designed for LLM agents, offering defense-in-depth capabilities including taint tracking, a proxy secret guard, a policy engine, and red-team benchmarking. It aims to protect AI agents from various attacks like prompt injection and unauthorized command execution.

RDR87
AgentsGitHub trend signalMay 25, 2026

wshobson/agents: A Multi-Harness Agentic Plugin Marketplace for AI Code Assistants

The wshobson/agents GitHub repository presents a multi-harness agentic plugin marketplace designed for various AI code assistants, including Claude Code, Codex CLI, Cursor, OpenCode, and Gemini CLI. It offers a collection of plugins, agents, skills, and commands from a single Markdown source, generating native artifacts for each supported harness.

RDR86
AI CodingGitHub trend signalMay 24, 2026

Swarm Orchestrator v10.0.0: AI-Generated PR Audit and Merge Gate

Swarm Orchestrator v10.0.0 introduces `swarm audit`, a new subcommand and GitHub Action designed to audit pull-request diffs for ten categories of AI-coding-agent 'cheat patterns'. It can block merges if blocking findings are detected and generates hash-chained audit ledgers and AI-BOM artifacts.

RDR85
Research Papersresearch signalMay 27, 2026

LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding

LocateAnything is a new framework for vision-language grounding and detection that uses Parallel Box Decoding (PBD) to improve both speed and accuracy. Unlike traditional methods that decode 2D boxes token by token, PBD decodes geometric elements as atomic units in a single step, enhancing parallelism and preserving geometric coherence. The framework is supported by LocateAnything-Data, a large dataset with over 138 million training samples.

RDR83
Benchmarksresearch signalMay 27, 2026

SpatialBench: A New Benchmark for Spatial Foundation Models

Researchers have introduced SpatialBench, a new benchmark designed to holistically assess the generalization capabilities of spatial foundation models across diverse tasks, viewpoints, scene domains, and input densities. The benchmark evaluates 41 models across 19 datasets and 546 scenes, revealing that current models are not yet "all-round players" and highlighting the importance of domain alignment and data quality over simple dataset scaling.

RDR81