THUDM / AgentBench
A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)
☆2,529Updated 3 months ago
Alternatives and similar repositories for AgentBench:
Users that are interested in AgentBench are comparing it to the libraries listed below
- AgentTuning: Enabling Generalized Agent Abilities for LLMs☆1,431Updated last year
- An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.☆1,736Updated 4 months ago
- Benchmarking large language models' complex reasoning ability with chain-of-thought prompting☆2,723Updated 9 months ago
- OpenAGI: When LLM Meets Domain Experts☆2,133Updated 5 months ago
- ☆2,783Updated 2 months ago
- [NeurIPS 2023] Reflexion: Language Agents with Verbal Reinforcement Learning☆2,701Updated 3 months ago
- Aligning pretrained language models with instruction data generated by themselves.☆4,359Updated 2 years ago
- This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai,…☆2,061Updated 11 months ago
- ☆900Updated 9 months ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,731Updated last year
- The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".☆1,518Updated last month
- [ACL2023] We introduce LLM-Blender, an innovative ensembling framework to attain consistently superior performance by leveraging the dive…☆940Updated 6 months ago
- A generalized information-seeking agent system with Large Language Models (LLMs).☆1,154Updated 10 months ago
- List of language agents based on paper "Cognitive Architectures for Language Agents"☆938Updated 3 months ago
- [ACL 2023] Reasoning with Language Model Prompting: A Survey☆952Updated 3 weeks ago
- ☆914Updated 11 months ago
- 800,000 step-level correctness labels on LLM solutions to MATH problems☆1,984Updated last year
- Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"☆977Updated 2 months ago
- Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".☆1,129Updated last year
- Instruction Tuning with GPT-4☆4,301Updated last year
- A library for advanced large language model reasoning☆2,113Updated 3 weeks ago
- ☆749Updated 10 months ago
- Official Implementation of "Graph of Thoughts: Solving Elaborate Problems with Large Language Models"☆2,356Updated 4 months ago
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆1,843Updated 8 months ago
- YaRN: Efficient Context Window Extension of Large Language Models☆1,479Updated last year
- ☆1,250Updated last year
- Code for our ACL 2023 Paper "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models".☆656Updated last year
- MTEB: Massive Text Embedding Benchmark☆2,469Updated this week
- [ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.☆5,016Updated 5 months ago
- Measuring Massive Multitask Language Understanding | ICLR 2021☆1,400Updated last year