multi-agent-systems-failure-taxonomy / MASFT
☆53Updated last week
Alternatives and similar repositories for MASFT:
Users that are interested in MASFT are comparing it to the libraries listed below
- ☆61Updated last month
- ☆87Updated this week
- Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks☆83Updated last week
- ☆18Updated 2 weeks ago
- Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)☆107Updated last month
- ☆106Updated last week
- ☆50Updated 4 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆165Updated 3 weeks ago
- [ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?☆52Updated last week
- Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"☆86Updated 5 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆62Updated 3 months ago
- Complex Function Calling Benchmark.☆85Updated 2 months ago
- ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)☆74Updated last week
- LLM reads a paper and produce a working prototype☆51Updated 2 weeks ago
- Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents☆122Updated 9 months ago
- Simple examples using Argilla tools to build AI☆53Updated 4 months ago
- Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)☆177Updated 2 weeks ago
- Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems☆75Updated 3 weeks ago
- Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…☆49Updated 8 months ago
- DSBench: How Far are Data Science Agents from Becoming Data Science Experts?☆45Updated last month
- ☆73Updated 2 months ago
- Functional Benchmarks and the Reasoning Gap☆84Updated 5 months ago
- ☆142Updated 8 months ago
- SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning☆48Updated last month
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆196Updated this week
- ☆90Updated 6 months ago
- ☆185Updated last month
- The first dense retrieval model that can be prompted like an LM☆67Updated 6 months ago
- Systematic evaluation framework that automatically rates overthinking behavior in large language models.☆80Updated last month
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆109Updated 9 months ago