microsoft / ToolTalk
Evaluating tool-augmented LLMs in conversation settings
☆76Updated 7 months ago
Alternatives and similar repositories for ToolTalk:
Users that are interested in ToolTalk are comparing it to the libraries listed below
- Code repo for "Agent Instructs Large Language Models to be General Zero-Shot Reasoners"☆98Updated 4 months ago
- Official code for "MAmmoTH2: Scaling Instructions from the Web" [NeurIPS 2024]☆131Updated 3 months ago
- Codebase accompanying the Summary of a Haystack paper.☆74Updated 4 months ago
- Code and data for "StructLM: Towards Building Generalist Models for Structured Knowledge Grounding" (COLM 2024)☆74Updated 3 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆47Updated last month
- Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models☆90Updated last year
- A set of utilities for running few-shot prompting experiments on large-language models☆116Updated last year
- Retrieval Augmented Generation Generalized Evaluation Dataset☆51Updated 2 months ago
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆128Updated 2 months ago
- Data preparation code for CrystalCoder 7B LLM☆44Updated 8 months ago
- evol augment any dataset online☆56Updated last year
- ToolBench, an evaluation suite for LLM tool manipulation capabilities.☆148Updated 11 months ago
- [ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"☆54Updated 11 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated 11 months ago
- Dense X Retrieval: What Retrieval Granularity Should We Use?☆143Updated last year
- Pre-training code for CrystalCoder 7B LLM☆55Updated 8 months ago
- ☆51Updated 6 months ago
- ☆62Updated 6 months ago
- Benchmark baseline for retrieval qa applications☆96Updated 9 months ago
- ☆74Updated last year
- My implementation of "Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models"☆95Updated last year
- ☆49Updated last year
- Code and data for CoachLM, an automatic instruction revision approach LLM instruction tuning.☆60Updated 10 months ago
- [SIGIR 2024 (Demo)] CoSearchAgent: A Lightweight Collborative Search Agent with Large Language Models☆22Updated 11 months ago
- ☆31Updated 11 months ago
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆40Updated 10 months ago
- LOFT: A 1 Million+ Token Long-Context Benchmark☆168Updated 3 months ago
- Official implementation for 'Extending LLMs’ Context Window with 100 Samples'☆76Updated last year
- ☆56Updated last week
- Evaluating LLMs with CommonGen-Lite☆88Updated 10 months ago