aws-samples / multiagent-collab-scenario-benchmarkLinks
Benchmarking data and script used for LLM multi-agent collaboration systems from AWS Bedrock Agents Science team.
☆17Updated last year
Alternatives and similar repositories for multiagent-collab-scenario-benchmark
Users that are interested in multiagent-collab-scenario-benchmark are comparing it to the libraries listed below
Sorting:
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Updated last year
- [ACL 2025] Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL☆12Updated 4 months ago
- [ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆103Updated 5 months ago
- MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents☆40Updated this week
- ☆331Updated 6 months ago
- ☆84Updated last year
- InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks (ICML 2024)☆181Updated 8 months ago
- A Comprehensive Library for Memory of LLM-based Agents.☆100Updated 8 months ago
- AgenTracer: A Lightweight Failure Attributor for Agentic Systems☆74Updated 2 months ago
- ☆223Updated this week
- ☆106Updated last year
- HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches☆35Updated 4 months ago
- PGRAG☆52Updated last year
- ☆29Updated 10 months ago
- ☆78Updated 4 months ago
- [ACL 2025] Agentic Knowledgeable Self-awareness☆91Updated 7 months ago
- Agent Skill Induction: "Inducing Programmatic Skills for Agentic Tasks"☆37Updated 9 months ago
- Self-Reflection in LLM Agents: Effects on Problem-Solving Performance☆93Updated last year
- ☆18Updated 3 months ago
- ☆35Updated 3 weeks ago
- Data and Code for EMNLP 2025 Findings Paper "MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search"☆86Updated 3 months ago
- A-MEM: Agentic Memory for LLM Agents☆260Updated 2 months ago
- The code for paper: Hierarchical Document Refinement for Long-context Retrieval-augmented Generation [ACL2025 Oral]☆41Updated 5 months ago
- LiveMCPBench is a benchmark for evaluating the ability of agents to navigate and utilize a large-scale MCP toolset. It provides a compreh…☆92Updated last month
- ☆52Updated 8 months ago
- AWM: Agent Workflow Memory☆389Updated last month
- ☆43Updated 3 months ago
- ☆16Updated last year
- Implementation for OAgents: An Empirical Study of Building Effective Agents☆306Updated 3 months ago
- [ACL 2025] Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆125Updated 8 months ago