👩⚖️ Agent-as-a-Judge: The Magic for Open-Endedness
☆729May 14, 2025Updated 10 months ago
Alternatives and similar repositories for agent-as-a-judge
Users that are interested in agent-as-a-judge are comparing it to the libraries listed below
Sorting:
- 🐝 The First Self-Improving Agentic Solution☆1,008Feb 5, 2026Updated last month
- Agentless🐱: an agentless approach to automatically solve software development problems☆2,019Dec 22, 2024Updated last year
- ☆28Nov 10, 2025Updated 4 months ago
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆19Oct 4, 2022Updated 3 years ago
- [NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆678Mar 16, 2025Updated last year
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients. Published in Nature.☆3,424Jul 25, 2025Updated 7 months ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software E…☆1,439Jul 18, 2025Updated 8 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆650Jul 29, 2025Updated 7 months ago
- Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.☆21,189Mar 11, 2025Updated last year
- An Open Large Reasoning Model for Real-World Solutions☆1,539Feb 13, 2026Updated last month
- ☆1,033Dec 17, 2024Updated last year
- [ICLR 2025] Automated Design of Agentic Systems☆1,534Jan 28, 2025Updated last year
- AG2 (formerly AutoGen): The Open-Source AgentOS. Join us at: https://discord.gg/sNGSwQME3x☆4,258Updated this week
- An open source code of the GitHub Copilot Workspace☆12Jun 8, 2024Updated last year
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆587Aug 10, 2025Updated 7 months ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆1,381Updated this week
- 🙌 OpenHands: AI-Driven Development☆69,254Updated this week
- Code and Data for Tau-Bench☆1,130Aug 28, 2025Updated 6 months ago
- DSPy: The framework for programming—not prompting—language models☆32,853Updated this week
- 🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org☆16,392Updated this week
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,238Feb 8, 2026Updated last month
- 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming☆65,185Jan 21, 2026Updated last month
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆6,805Jul 4, 2025Updated 8 months ago
- Flexible and powerful framework for managing multiple AI agents and handling complex conversations☆7,514Feb 11, 2026Updated last month
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning & ReCall: Learning to Reason with Tool Call for LLMs via Rei…☆1,343May 16, 2025Updated 10 months ago
- A programming framework for agentic AI☆55,559Mar 11, 2026Updated last week
- O1 Replication Journey☆1,999Jan 14, 2025Updated last year
- ☆12Sep 14, 2023Updated 2 years ago
- ☆64Apr 9, 2024Updated last year
- Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including C…☆5,378Oct 30, 2025Updated 4 months ago
- SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersec…☆18,730Mar 9, 2026Updated last week
- A modular graph-based Retrieval-Augmented Generation (RAG) system☆31,474Updated this week
- SWE-bench: Can Language Models Resolve Real-world Github Issues?☆4,478Updated this week
- Universal memory layer for AI Agents☆50,147Updated this week
- [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents☆597Updated this week
- RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.☆2,539Mar 12, 2026Updated last week
- Official Repo for ICML 2024 paper "Executable Code Actions Elicit Better LLM Agents" by Xingyao Wang, Yangyi Chen, Lifan Yuan, Yizhe Zhan…☆1,630May 23, 2024Updated last year
- Enhancing AI Software Engineering with Repository-level Code Graph☆259Apr 1, 2025Updated 11 months ago