apple / ToolSandboxLinks

☆194

Alternatives and similar repositories for ToolSandbox

Users that are interested in ToolSandbox are comparing it to the libraries listed below

Sorting:

wang-research-lab / agentinstruct
Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"
☆114Updated 10 months ago
zorazrw / agent-workflow-memory
AWM: Agent Workflow Memory
☆297Updated 6 months ago
zai-org / ComplexFuncBench
Complex Function Calling Benchmark.
☆123Updated 6 months ago
aymeric-roucher / GAIA
Beating the GAIA benchmark with Transformers Agents. 🚀
☆131Updated 5 months ago
withmartian / routerbench
The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System
☆131Updated last year
matthewrenze / self-reflection
Self-Reflection in LLM Agents: Effects on Problem-Solving Performance
☆78Updated 8 months ago
SalesforceAIResearch / xLAM
xLAM: A Family of Large Action Models to Empower AI Agent Systems
☆513Updated this week
StonyBrookNLP / appworld
🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…
☆232Updated 2 months ago
spcl / MRAG
Official Implementation of "Multi-Head RAG: Solving Multi-Aspect Problems with LLMs"
☆222Updated last month
zorazrw / awesome-tool-llm
☆237Updated 11 months ago
Liyan06 / MiniCheck
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]
☆174Updated 6 months ago
diagram-of-thought / diagram-of-thought
Official implementation of paper "On the Diagram of Thought" (https://arxiv.org/abs/2409.10038)
☆184Updated 4 months ago
TIGER-AI-Lab / LongRAG
Official repo for "LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs".
☆236Updated 11 months ago
Nardien / agent-distillation
Official Code Repository for the paper "Distilling LLM Agent into Small Models with Retrieval and Code Tools"
☆126Updated last week
kohjingyu / search-agents
Code for the paper 🌳 Tree Search for Language Model Agents
☆208Updated last year
hkust-nlp / AgentBoard
An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]
☆332Updated last year
ulab-uiuc / MARBLE
(ACL 2025 Main) Code for MultiAgentBench : Evaluating the Collaboration and Competition of LLM agents https://www.arxiv.org/pdf/2503.019…
☆137Updated this week
sambanova / toolbench
ToolBench, an evaluation suite for LLM tool manipulation capabilities.
☆157Updated last year
aymeric-roucher / agent_reasoning_benchmark
🔧 Compare how Agent systems perform on several benchmarks. 📊🚀
☆99Updated 9 months ago
facebookresearch / CRAG
Comprehensive benchmark for RAG
☆203Updated last month
dwzhu-pku / LongEmbed
LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)
☆140Updated 8 months ago
SALT-NLP / collaborative-gym
Framework and toolkits for building and evaluating collaborative agents that can work together with humans.
☆90Updated 3 months ago
arcee-ai / EvolKit
EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…
☆230Updated 9 months ago
JinjieNi / MixEval
The official evaluation suite and dynamic data release for MixEval.
☆242Updated 8 months ago
aorwall / moatless-tree-search
☆99Updated last month
facebookresearch / ReasonIR
Official repository for paper "ReasonIR Training Retrievers for Reasoning Tasks".
☆187Updated last month
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆175Updated 4 months ago
multi-agent-systems-failure-taxonomy / MAST
☆248Updated last week
microsoft / simulated-trial-and-error
☆122Updated last year
xiaowu0162 / LongMemEval
Benchmarking Chat Assistants on Long-Term Interactive Memory (ICLR 2025)
☆159Updated 3 months ago