metauto-ai / agent-as-a-judge
π€ Agent-as-a-Judge and DevAI dataset
β192Updated this week
Related projects β
Alternatives and complementary repositories for agent-as-a-judge
- Environments, tools, and benchmarks for general computer agentsβ172Updated 3 weeks ago
- FireAct: Toward Language Agent Fine-tuningβ255Updated last year
- Building Open LLM Web Agents with Self-Evolving Online Curriculum RLβ204Updated this week
- β287Updated 2 months ago
- AWM: Agent Workflow Memoryβ205Updated last month
- This is the official repo for "PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization". PromptAgenβ¦β204Updated 3 months ago
- β116Updated 5 months ago
- The official implementation of paper "ToolGen: Unified Tool Retrieval and Calling via Generation"β99Updated 3 weeks ago
- Offical Repo for "Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale"β191Updated last month
- Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and β¦β328Updated 5 months ago
- Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".β192Updated 2 months ago
- A repo with an automated prompt engineering workflow from scratch. It leverages the OPRO technique.β156Updated 2 months ago
- Expert Specialized Fine-Tuningβ145Updated last month
- Code and implementations for the paper "AgentGym: Evolving Large Language Model-based Agents across Diverse Environments" by Zhiheng Xi eβ¦β352Updated 2 months ago
- Reformatted Alignmentβ112Updated last month
- This repository contains the paper list for the paper: Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoβ¦β342Updated 11 months ago
- [EMNLP 2024: Demo Oral] RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generationβ254Updated last month
- Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"β175Updated 2 weeks ago
- β103Updated 3 months ago
- An Analytical Evaluation Board of Multi-turn LLM Agentsβ250Updated 6 months ago
- Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"β234Updated last month
- Generative Judge for Evaluating Alignmentβ217Updated 10 months ago
- Official implementation of "DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning" in ICML'24β128Updated 2 weeks ago
- KnowAgent: Knowledge-Augmented Planning for LLM-Based Agentsβ172Updated last month
- OS-ATLAS: A Foundation Action Model For Generalist GUI Agentsβ166Updated this week
- β316Updated last month
- β226Updated this week
- β152Updated 2 months ago
- Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)β170Updated last month
- Official repository for "Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing". Your efficient and high-quality sβ¦β491Updated 2 weeks ago