Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"
☆327Oct 19, 2024Updated last year
Alternatives and similar repositories for ChatEval
Users that are interested in ChatEval are comparing it to the libraries listed below
Sorting:
- ICML 2024: Improving Factuality and Reasoning in Language Models through Multiagent Debate☆510Apr 24, 2025Updated 10 months ago
- MAD: The first work to explore Multi-Agent Debate with Large Language Models :D☆522Dec 16, 2025Updated 2 months ago
- 🤖 AgentVerse 🪐 is designed to facilitate the deployment of multiple LLM-based agents in various applications, which primarily provides …☆4,947Sep 9, 2024Updated last year
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]☆396May 20, 2024Updated last year
- "TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks" [TMLR 2024]☆32Dec 21, 2024Updated last year
- [IJCAI 2024] Generate different roles for GPTs to form a collaborative entity for complex tasks.☆1,466Sep 9, 2025Updated 5 months ago
- Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"☆69Nov 14, 2024Updated last year
- Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization☆196May 16, 2024Updated last year
- ☆330Jun 19, 2024Updated last year
- ☆2,882Feb 20, 2025Updated last year
- ☆51Jun 14, 2024Updated last year
- Must-read Papers on LLM Agents.☆2,897Jan 15, 2026Updated last month
- ☆75Dec 5, 2024Updated last year
- Data and code for ACL 2023 paper "RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations"☆15Feb 8, 2024Updated 2 years ago
- LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.☆784Oct 4, 2024Updated last year
- ☆144Sep 10, 2023Updated 2 years ago
- A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)☆3,187Feb 8, 2026Updated 3 weeks ago
- ReCross: Unsupervised Cross-Task Generalization via Retrieval Augmentation☆24May 1, 2022Updated 3 years ago
- Source Code of Paper "GPTScore: Evaluate as You Desire"☆258Feb 21, 2023Updated 3 years ago
- ChatArena (or Chat Arena) is a Multi-Agent Language Game Environments for LLMs. The goal is to develop communication and collaboration ca…☆1,539Aug 11, 2025Updated 6 months ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆148Nov 26, 2024Updated last year
- LLM Agora, debating between open-source LLMs to refine the answers☆85Sep 28, 2023Updated 2 years ago
- Self-Alignment with Principle-Following Reward Models☆169Sep 18, 2025Updated 5 months ago
- A Bilingual Role Evaluation Benchmark for Large Language Models☆43Jan 9, 2024Updated 2 years ago
- MemoChat: Tuning LLMs to Use Memos for Consistent Long-Range Open-Domain Conversation☆28Apr 18, 2024Updated last year
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,953Aug 9, 2025Updated 6 months ago
- [ACL 2024 Demo] Official GitHub repo for UltraEval: An open source framework for evaluating foundation models.☆256Oct 30, 2024Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆43Feb 15, 2024Updated 2 years ago
- Repo for paper "Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration"☆349May 8, 2024Updated last year
- ☆313Jun 9, 2024Updated last year
- Code of the COLING22 paper "uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers"☆19Aug 17, 2022Updated 3 years ago
- R-Judge: Benchmarking Safety Risk Awareness for LLM Agents (EMNLP Findings 2024)☆99Jan 11, 2026Updated last month
- An Open-source Framework for Data-centric, Self-evolving Autonomous Language Agents☆5,876Sep 26, 2024Updated last year
- On Transferability of Prompt Tuning for Natural Language Processing☆101May 3, 2024Updated last year
- [COLM 2024] OpenAgents: An Open Platform for Language Agents in the Wild☆4,716Nov 18, 2024Updated last year
- The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".☆1,589Jun 3, 2025Updated 9 months ago
- [ACL 2024] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning☆234Jan 13, 2025Updated last year
- Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback☆208May 24, 2023Updated 2 years ago
- Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...☆2,203Apr 30, 2025Updated 10 months ago