metauto-ai / agent-as-a-judgeLinks

⚖️ The First Coding Agent-as-a-Judge

☆593

Alternatives and similar repositories for agent-as-a-judge

Users that are interested in agent-as-a-judge are comparing it to the libraries listed below

Sorting:

theworldofagents / Agentic-Reasoning
free and open OpenAI Deep Research
☆651Updated 5 months ago
camel-ai / crab
🦀️ CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents. https://crab.camel-ai.org/
☆364Updated last month
CharlesQ9 / Alita
☆761Updated last month
OpenBMB / IoA
An open-source framework for collaborative AI agents, enabling diverse, distributed agents to team up and tackle complex tasks through in…
☆745Updated 9 months ago
xingyaoww / code-act
Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…
☆1,317Updated last year
SalesforceAIResearch / xLAM
xLAM: A Family of Large Action Models to Empower AI Agent Systems
☆513Updated this week
facebookresearch / swe-rl
Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"
☆573Updated 4 months ago
agiresearch / A-mem
A-MEM: Agentic Memory for LLM Agents
☆511Updated last month
TheAgentCompany / TheAgentCompany
An agent benchmark with tasks in a simulated software company.
☆509Updated this week
ranpox / awesome-computer-use
This is a collection of resources for computer-use GUI agents, including videos, blogs, papers, and projects.
☆412Updated 2 months ago
BytedTsinghua-SIA / MemAgent
A MemAgent framework that can be extrapolated to 3.5M, along with a training framework for RL training of any agent workflow.
☆548Updated this week
SalesforceAIResearch / AgentLite
☆618Updated 6 months ago
multi-agent-systems-failure-taxonomy / MAST
☆248Updated last week
gersteinlab / LocAgent
[ACL 2025] Graph-guided agentic framework for code localization https://arxiv.org/abs/2503.09089
☆488Updated 3 months ago
langchain-ai / langgraph-codeact
☆571Updated 2 months ago
DavidZWZ / Awesome-Deep-Research
[Up-to-date] Awesome Agentic Deep Research Resources
☆354Updated 2 weeks ago
sunnynexus / Search-o1
Search-o1: Agentic Search-Enhanced Large Reasoning Models
☆995Updated 2 months ago
sileix / chain-of-draft
Code and data for the Chain-of-Draft (CoD) paper
☆313Updated 4 months ago
zorazrw / agent-workflow-memory
AWM: Agent Workflow Memory
☆297Updated 6 months ago
GenseeAI / cognify
Multi-Faceted AI Agent and Workflow Autotuning. Automatically optimizes LangChain, LangGraph, DSPy programs for better quality, lower exe…
☆246Updated 2 months ago
Terry-Xu-666 / NodeRAG
The official repository of NodeRAG
☆332Updated 4 months ago
microsoft / WindowsAgentArena
Windows Agent Arena (WAA) 🪟 is a scalable OS platform for testing and benchmarking of multi-modal AI agents.
☆745Updated 3 months ago
YangLing0818 / buffer-of-thought-llm
[NeurIPS 2024 Spotlight] Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
☆649Updated last month
qixucen / atom
Atom of Thoughts for Markov LLM Test-Time Scaling
☆580Updated last month
ai-agents-2030 / awesome-deep-research-agent
☆263Updated last month
rungalileo / agent-leaderboard
Ranking LLMs on agentic tasks
☆176Updated 2 weeks ago
agent-husky / Husky-v1
Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and …
☆345Updated last year
principia-ai / WriteHERE
An Open-Source AI Writing Project.
☆340Updated 3 weeks ago
plageon / HtmlRAG
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieval Results in RAG Systems (WWW 2025)
☆433Updated last month
thunlp / ProactiveAgent
A LLM-based Agent that predict its tasks proactively.
☆401Updated 2 months ago