ulab-uiuc / MARBLELinks

(ACL 2025 Main) Code for MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents https://www.arxiv.org/pdf/2503.01935

☆137

Alternatives and similar repositories for MARBLE

Users that are interested in MARBLE are comparing it to the libraries listed below

Sorting:

multi-agent-systems-failure-taxonomy / MAST
☆248Updated 2 weeks ago
Ayanami0730 / deep_research_bench
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
☆245Updated this week
SALT-NLP / collaborative-gym
Framework and toolkits for building and evaluating collaborative agents that can work together with humans.
☆91Updated 3 months ago
facebookresearch / sweet_rl
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
☆233Updated 3 months ago
Nardien / agent-distillation
Official Code Repository for the paper "Distilling LLM Agent into Small Models with Retrieval and Code Tools"
☆130Updated this week
LiqiangJing / DSBench
DSBench: How Far are Data Science Agents from Becoming Data Science Experts?
☆65Updated 5 months ago
zhangxjohn / LLM-Agent-Benchmark-List
A banchmark list for evaluation of large language models.
☆134Updated last month
ByteDance-Seed / Agent-R
Resources for our paper: "Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training"
☆154Updated last month
suzgunmirac / dynamic-cheatsheet
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory
☆69Updated 2 months ago
DataArcTech / LLM-as-a-Judge
☆128Updated 4 months ago
THU-KEG / Agentic-Reward-Modeling
[ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆99Updated last month
StonyBrookNLP / appworld
🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…
☆232Updated 2 months ago
zorazrw / awesome-tool-llm
☆237Updated 11 months ago
matthewrenze / self-reflection
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
☆78Updated 8 months ago
tianyang-x / SaySelf
Public code repo for paper "SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales"
☆108Updated 10 months ago
rxlqn / awesome-llm-self-reflection
augmented LLM with self reflection
☆129Updated last year
zorazrw / agent-workflow-memory
AWM: Agent Workflow Memory
☆300Updated 6 months ago
vsubramaniam851 / multiagent-ft
☆212Updated 5 months ago
zjunlp / WorfBench
[ICLR 2025] Benchmarking Agentic Workflow Generation
☆117Updated 5 months ago
ai-agents-2030 / awesome-deep-research-agent
☆279Updated last month
diagram-of-thought / diagram-of-thought
Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)
☆184Updated 4 months ago
hkust-nlp / AgentBoard
An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
☆335Updated last year
samkhur006 / awesome-llm-planning-reasoning
A curated collection of LLM reasoning and planning resources, including key papers, limitations, benchmarks, and additional learning mate…
☆285Updated 5 months ago
xlang-ai / Spider2-V
[NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
☆129Updated 11 months ago
zai-org / ComplexFuncBench
Complex Function Calling Benchmark.
☆123Updated 6 months ago
facebookresearch / ReasonIR
Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".
☆188Updated last month
yale-nlp / MCTS-RAG
☆58Updated last month
AlexCuadron / ThinkingAgent
Systematic evaluation framework that automatically rates overthinking behavior in large language models.
☆91Updated 2 months ago
apple / ToolSandbox
☆194Updated 11 months ago
thunlp / Optima
Code for paper "Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System"
☆59Updated 8 months ago