ServiceNow / TapeAgents
TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle
☆124Updated this week
Related projects ⓘ
Alternatives and complementary repositories for TapeAgents
- Codebase accompanying the Summary of a Haystack paper.☆72Updated 2 months ago
- WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?☆127Updated 3 weeks ago
- ☆39Updated this week
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆97Updated 7 months ago
- Code and Data for Tau-Bench☆201Updated 3 weeks ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- Automating enterprise workflows with multimodal agents☆94Updated last month
- Automatic Evals for Instruction-Tuned Models☆45Updated this week
- ☆112Updated last month
- Evaluating LLMs with CommonGen-Lite☆85Updated 8 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆92Updated 5 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆128Updated 3 weeks ago
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆130Updated this week
- AWM: Agent Workflow Memory☆205Updated last month
- awesome synthetic (text) datasets☆242Updated 3 weeks ago
- A simple unified framework for evaluating LLMs☆145Updated last week
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]☆103Updated last month
- Official repository for "Scaling Retrieval-Based Langauge Models with a Trillion-Token Datastore".☆129Updated this week
- Mixing Language Models with Self-Verification and Meta-Verification☆97Updated last year
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆106Updated 3 weeks ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆236Updated 4 months ago
- Just a bunch of benchmark logs for different LLMs☆114Updated 3 months ago
- ☆127Updated 3 months ago
- Evaluating LLMs with fewer examples☆134Updated 7 months ago
- ☆129Updated 3 weeks ago
- ☆128Updated this week
- ☆63Updated 7 months ago
- WorkBench: a Benchmark Dataset for Agents in a Realistic Workplace Setting.☆33Updated 3 months ago
- code for training & evaluating Contextual Document Embedding models☆117Updated this week
- Retrieval Augmented Generation Generalized Evaluation Dataset☆51Updated this week