arthur-ai / benchLinks
A tool for evaluating LLMs
☆424Updated last year
Alternatives and similar repositories for bench
Users that are interested in bench are comparing it to the libraries listed below
Sorting:
- Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)☆396Updated last year
- Python SDK for running evaluations on LLM generated responses☆289Updated last month
- Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.☆311Updated last month
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆431Updated last year
- Continuous Integration for LLM powered applications☆245Updated last year
- Domain Adapted Language Modeling Toolkit - E2E RAG☆323Updated 8 months ago
- 🦜💯 Flex those feathers!☆250Updated 8 months ago
- Fiddler Auditor is a tool to evaluate language models.☆183Updated last year
- 🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring sa…☆925Updated 7 months ago
- ☆460Updated last year
- 🍰 PromptLayer - Maintain a log of your prompts and OpenAI API requests. Track, debug, and replay old completions.☆635Updated this week
- An Awesome list of curated DSPy resources.☆372Updated 4 months ago
- data cleaning and curation for unstructured text☆327Updated 11 months ago
- Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)☆264Updated last year
- wandbot is a technical support bot for Weights & Biases' AI developer tools that can run in Discord, Slack, ChatGPT and Zendesk☆301Updated this week
- Data-Driven Evaluation for LLM-Powered Applications☆500Updated 5 months ago
- VectorFlow is a high volume vector embedding pipeline that ingests raw data, transforms it into vectors and writes it to a vector DB of y…☆694Updated last year
- Automatically evaluate your LLMs in Google Colab☆646Updated last year
- OpenTelemetry Instrumentation for AI Observability☆491Updated this week
- ☆503Updated 10 months ago
- Scale LLM Engine public repository☆808Updated last week
- Automated Evaluation of RAG Systems☆622Updated 3 months ago
- Tutorial for building LLM router☆214Updated 11 months ago
- ☆185Updated last year
- ☆464Updated last year
- Open-Source Implementation of WizardLM to turn documents into Q:A pairs for LLM fine-tuning☆311Updated 8 months ago
- Task-based Agentic Framework using StrictJSON as the core☆453Updated last week
- The Rule-based Retrieval package is a Python package that enables you to create and manage Retrieval Augmented Generation (RAG) applicati…☆243Updated 9 months ago
- 🤖🌊 aiFlows: The building blocks of your collaborative AI☆259Updated last year
- Fine-Tuning Embedding for RAG with Synthetic Data☆503Updated last year