apple / ToolSandbox
☆160Updated 7 months ago
Alternatives and similar repositories for ToolSandbox:
Users that are interested in ToolSandbox are comparing it to the libraries listed below
- AWM: Agent Workflow Memory☆253Updated last month
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆104Updated 6 months ago
- EvolKit is an innovative framework designed to automatically enhance the complexity of instructions used for fine-tuning Large Language M…☆207Updated 4 months ago
- ☆374Updated 2 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆131Updated 4 months ago
- Code for Husky, an open-source language agent that solves complex, multi-step reasoning tasks. Husky v1 addresses numerical, tabular and …☆336Updated 9 months ago
- Beating the GAIA benchmark with Transformers Agents. 🚀☆103Updated last month
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"☆463Updated last year
- ☆217Updated 7 months ago
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]☆291Updated 10 months ago
- Self-Reflection in LLM Agents: Effects on Problem-Solving Performance☆63Updated 3 months ago
- ☆120Updated 9 months ago
- [ACL 2024] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning☆214Updated 2 months ago
- Benchmark baseline for retrieval qa applications☆103Updated 11 months ago
- A simple unified framework for evaluating LLMs☆204Updated last week
- Codebase accompanying the Summary of a Haystack paper.☆75Updated 6 months ago
- Dense X Retrieval: What Retrieval Granularity Should We Use?☆152Updated last year
- Code and Data for Tau-Bench☆333Updated 2 months ago
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …☆254Updated last year
- ☆73Updated 2 months ago
- 🔧 Compare how Agent systems perform on several benchmarks. 📊🚀☆89Updated 4 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆157Updated 3 months ago
- This is the repository for our paper "INTERS: Unlocking the Power of Large Language Models in Search with Instruction Tuning"☆203Updated 3 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆162Updated 3 months ago
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆150Updated last year
- Comprehensive benchmark for RAG☆144Updated 4 months ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆180Updated this week
- ☆119Updated 5 months ago
- An implemtation of Everyting of Thoughts (XoT).☆141Updated last year