salesforce / BOLAA
☆180Updated 3 months ago
Alternatives and similar repositories for BOLAA:
Users that are interested in BOLAA are comparing it to the libraries listed below
- An Analytical Evaluation Board of Multi-turn LLM Agents [NeurIPS 2024 Oral]☆313Updated 11 months ago
- ☆121Updated 11 months ago
- [ACL 2024] AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning☆221Updated 3 months ago
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks☆307Updated 6 months ago
- FireAct: Toward Language Agent Fine-tuning☆275Updated last year
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆135Updated 5 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆96Updated last year
- Official Implementation of Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization☆142Updated 11 months ago
- ☆227Updated 8 months ago
- My implementation of "Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models"☆98Updated last year
- An implemtation of Everyting of Thoughts (XoT).☆142Updated last year
- Data and Code for Program of Thoughts (TMLR 2023)☆270Updated 11 months ago
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆151Updated last year
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆138Updated 6 months ago
- Reasoning with Language Model is Planning with World Model☆164Updated last year
- ToolQA, a new dataset to evaluate the capabilities of LLMs in answering challenging questions with external tools. It offers two levels …☆260Updated last year
- ☆94Updated 4 months ago
- Official implementation of paper "Cumulative Reasoning With Large Language Models" (https://arxiv.org/abs/2308.04371)☆293Updated 7 months ago
- [NeurIPS 2022] 🛒WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents☆339Updated 8 months ago
- [ICLR 2024] MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use☆86Updated last year
- A banchmark list for evaluation of large language models.☆103Updated 2 weeks ago
- Codes for our paper "ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate"☆280Updated 6 months ago
- ☆173Updated last year
- augmented LLM with self reflection☆120Updated last year
- [NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?☆124Updated 8 months ago
- [ACL 2024] Exploring Collaboration Mechanisms for LLM Agents: A Social Psychology View☆116Updated 11 months ago
- [ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following☆124Updated 10 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated last year
- ☆269Updated 2 years ago
- Benchmark baseline for retrieval qa applications☆109Updated last year