microsoft / ToolTalk
Evaluating tool-augmented LLMs in conversation settings
☆77Updated 8 months ago
Alternatives and similar repositories for ToolTalk:
Users that are interested in ToolTalk are comparing it to the libraries listed below
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆135Updated 3 months ago
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆98Updated 5 months ago
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆92Updated last year
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- ☆62Updated 3 weeks ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆76Updated 3 months ago
- A set of utilities for running few-shot prompting experiments on large-language models☆117Updated last year
- ☆120Updated 8 months ago
- ☆31Updated 11 months ago
- Retrieval Augmented Generation Generalized Evaluation Dataset☆51Updated 2 months ago
- Reasoning by Communicating with Agents☆24Updated 4 months ago
- Codebase accompanying the Summary of a Haystack paper.☆74Updated 4 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated 11 months ago
- ☆49Updated last year
- ☆69Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆47Updated 2 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆141Updated 2 months ago
- Flacuna was developed by fine-tuning Vicuna on Flan-mini, a comprehensive instruction collection encompassing various tasks. Vicuna is al…☆111Updated last year
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆150Updated 11 months ago
- ☆153Updated 6 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆76Updated last year
- ☆36Updated 6 months ago
- [NeurIPS 2024] Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?☆116Updated 5 months ago
- ☆38Updated 6 months ago
- ☆66Updated last year
- Pre-training code for CrystalCoder 7B LLM☆55Updated 9 months ago
- Code of ICLR paper: https://openreview.net/forum?id=-cqvvvb-NkI☆93Updated last year
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆60Updated 10 months ago
- [SIGIR 2024 (Demo)] CoSearchAgent: A Lightweight Collborative Search Agent with Large Language Models☆22Updated last year
- Code accompanying "How I learned to start worrying about prompt formatting".☆102Updated 4 months ago