👩⚖️ Agent-as-a-Judge: The Magic for Open-Endedness
☆787Mar 28, 2026Updated 3 months ago
Alternatives and similar repositories for agent-as-a-judge
Users that are interested in agent-as-a-judge are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 🐝 The First Self-Improving agents with RL / Prompting Optimization☆1,015Feb 5, 2026Updated 4 months ago
- Agentless🐱: an agentless approach to automatically solve software development problems☆2,074Dec 22, 2024Updated last year
- ☆28Jun 2, 2026Updated 3 weeks ago
- Code for paper "Point and Ask: Incorporating Pointing into Visual Question Answering"☆19Oct 4, 2022Updated 3 years ago
- [NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"☆704Mar 16, 2025Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- This repo contains the dataset and code for the paper "SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software E…☆1,438Jul 18, 2025Updated 11 months ago
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients. Published in Nature.☆3,625Jul 25, 2025Updated 11 months ago
- Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]☆696Jul 29, 2025Updated 11 months ago
- Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.☆21,725Apr 15, 2026Updated 2 months ago
- ☆1,034Dec 17, 2024Updated last year
- An Open Large Reasoning Model for Real-World Solutions☆1,540Jun 17, 2026Updated last week
- [ICLR 2025] Automated Design of Agentic Systems☆1,598Jan 28, 2025Updated last year
- An open source code of the GitHub Copilot Workspace☆13Jun 8, 2024Updated 2 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- AG2 (formerly AutoGen): The Open-Source AgentOS.Join us at: https://discord.gg/sNGSwQME3x☆4,710Updated this week
- MLGym A New Framework and Benchmark for Advancing AI Research Agents☆607Aug 10, 2025Updated 10 months ago
- Code and Data for Tau-Bench☆1,292Mar 18, 2026Updated 3 months ago
- MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering☆1,594Apr 24, 2026Updated 2 months ago
- 🙌 OpenHands: AI-Driven Development☆78,644Updated this week
- DSPy: The framework for programming—not prompting—language models☆35,605Updated this week
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,519Feb 8, 2026Updated 4 months ago
- 🐫 CAMEL: The first and the best multi-agent framework. Finding the Scaling Law of Agents. https://www.camel-ai.org☆17,253Updated this week
- 🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming☆69,068Jan 21, 2026Updated 5 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- 🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)☆6,878Jul 4, 2025Updated 11 months ago
- Flexible and powerful framework for managing multiple AI agents and handling complex conversations☆7,668Updated this week
- A programming framework for agentic AI☆59,261Apr 15, 2026Updated 2 months ago