MultiagentBench / MARBLE
Code for MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents https://www.arxiv.org/pdf/2503.01935
☆84Updated last week
Alternatives and similar repositories for MARBLE:
Users that are interested in MARBLE are comparing it to the libraries listed below
- ☆185Updated last month
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆45Updated last month
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆70Updated last month
- ☆111Updated last month
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆83Updated last week
- ☆103Updated 2 months ago
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆86Updated 5 months ago
- Flow of Reasoning: Training LLMs for Divergent Problem Solving with Minimal Examples☆78Updated 3 weeks ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆94Updated last year
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆107Updated last month
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆80Updated last month
- Codebase accompanying the Summary of a Haystack paper.☆75Updated 6 months ago
- Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"☆101Updated 6 months ago
- ☆96Updated 9 months ago
- ☆138Updated 10 months ago
- ☆119Updated 5 months ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆75Updated 3 weeks ago
- Complex Function Calling Benchmark.☆85Updated 2 months ago
- [NeurIPS 2024] Agent Planning with World Knowledge Model☆121Updated 3 months ago
- A banchmark list for evaluation of large language models.☆91Updated 2 weeks ago
- ☆53Updated last week
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆104Updated 6 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆168Updated 2 months ago
- ☆90Updated last week
- CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments☆49Updated last month
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆165Updated 3 weeks ago
- Code for EMNLP 2024 paper "Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning"☆53Updated 5 months ago
- minimal GRPO implementation from scratch☆62Updated 2 weeks ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆62Updated 3 months ago
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆196Updated this week