multi-agent-systems-failure-taxonomy / MASFT

☆53

Alternatives and similar repositories for MASFT:

Users that are interested in MASFT are comparing it to the libraries listed below

LLMSELECTOR / LLMSELECTOR
☆61Updated last month
ali-bahrainian / RAG_best_practices
☆87Updated this week
facebookresearch / sweet_rl
Benchmark and research code for the paper SWEET-RL Training Multi-Turn LLM Agents onCollaborative Reasoning Tasks
☆83Updated last week
metal-chart-generation / metal
☆18Updated 2 weeks ago
AgnostiqHQ / multi-agent-llm
Lean implementation of various multi-agent LLM methods, including Iteration of Thought (IoT)
☆107Updated last month
PrimeIntellect-ai / genesys
☆106Updated last week
yueqis / API-Based-Agent
☆50Updated 4 months ago
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆165Updated 3 weeks ago
google-deepmind / latent-multi-hop-reasoning
[ACL 2024] Do Large Language Models Latently Perform Multi-Hop Reasoning?
☆52Updated last week
siyuyuan / evoagent
Resources for our paper: "EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms"
☆86Updated 5 months ago
HishamAlyahya / semantic_backprop
Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖
☆62Updated 3 months ago
THUDM / ComplexFuncBench
Complex Function Calling Benchmark.
☆85Updated 2 months ago
zhudotexe / redel
ReDel is a toolkit for researchers and developers to build, iterate on, and analyze recursive multi-agent systems. (EMNLP 2024 Demo)
☆74Updated last week
phunterlau / paper_without_code
LLM reads a paper and produce a working prototype
☆51Updated 2 weeks ago
agiresearch / Formal-LLM
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents
☆122Updated 9 months ago
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆53Updated 4 months ago
diagram-of-thought / diagram-of-thought
Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)
☆177Updated 2 weeks ago
THU-KEG / Agentic-Reward-Modeling
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems
☆75Updated 3 weeks ago
deshwalmahesh / PHUDGE
Official repo for the paper PHUDGE: Phi-3 as Scalable Judge. Evaluate your LLMs with or without custom rubric, reference answer, absolute…
☆49Updated 8 months ago
LiqiangJing / DSBench
DSBench: How Far are Data Science Agents from Becoming Data Science Experts?
☆45Updated last month
aorwall / moatless-tree-search
☆73Updated 2 months ago
ConsequentAI / fneval
Functional Benchmarks and the Reasoning Gap
☆84Updated 5 months ago
apple / ml-superposition-prompting
☆142Updated 8 months ago
zou-group / sirius
SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning
☆48Updated last month
RulinShao / retrieval-scaling
Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".
☆196Updated this week
DeepSoftwareAnalytics / Awesome-Agent4SE
☆90Updated 6 months ago
vsubramaniam851 / multiagent-ft
☆185Updated last month
orionw / promptriever
The first dense retrieval model that can be prompted like an LM
☆67Updated 6 months ago
AlexCuadron / ThinkingAgent
Systematic evaluation framework that automatically rates overthinking behavior in large language models.
☆80Updated last month
withmartian / routerbench
The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System
☆109Updated 9 months ago