ServiceNow / TapeAgents
TapeAgents is a framework that facilitates all stages of the LLM Agent development lifecycle
☆121Updated this week
Related projects ⓘ
Alternatives and complementary repositories for TapeAgents
- WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?☆124Updated last week
- Doing simple retrieval from LLM models at various context lengths to measure accuracy☆97Updated 7 months ago
- Codebase accompanying the Summary of a Haystack paper.☆72Updated last month
- ☆36Updated this week
- RAGElo is a set of tools that helps you selecting the best RAG-based LLM agents by using an Elo ranker☆105Updated last week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆119Updated 3 weeks ago
- Attribute (or cite) statements generated by LLMs back to in-context information.☆142Updated last month
- Manage scalable open LLM inference endpoints in Slurm clusters☆236Updated 4 months ago
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆91Updated 4 months ago
- AWM: Agent Workflow Memory☆203Updated last month
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 7 months ago
- Mixing Language Models with Self-Verification and Meta-Verification☆97Updated last year
- awesome synthetic (text) datasets☆239Updated 2 weeks ago
- ☆111Updated last month
- A simple unified framework for evaluating LLMs☆138Updated this week
- Just a bunch of benchmark logs for different LLMs☆114Updated 3 months ago
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆106Updated 2 weeks ago
- Red-Teaming Language Models with DSPy☆142Updated 7 months ago
- Vision Document Retrieval (ViDoRe): Benchmark. Evaluation code for the ColPali paper.☆127Updated this week
- MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents [EMNLP 2024]☆101Updated 3 weeks ago
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- Retrieval Augmented Generation Generalized Evaluation Dataset☆51Updated last month
- LongEmbed: Extending Embedding Models for Long Context Retrieval (EMNLP 2024)☆114Updated this week
- Code accompanying "How I learned to start worrying about prompt formatting".☆92Updated last month
- ☆92Updated last month
- Automating enterprise workflows with multimodal agents☆94Updated last month
- ARAGOG- Advanced RAG Output Grading. Exploring and comparing various Retrieval-Augmented Generation (RAG) techniques on AI research paper…☆96Updated 6 months ago
- Google Deepmind's PromptBreeder for automated prompt engineering implemented in langchain expression language.☆63Updated 3 months ago
- Initiative to evaluate and rank the most popular LLMs across common task types based on their propensity to hallucinate.☆100Updated 2 months ago
- Evaluating LLMs with CommonGen-Lite☆84Updated 7 months ago