arthur-ai / benchLinks
A tool for evaluating LLMs
☆425Updated last year
Alternatives and similar repositories for bench
Users that are interested in bench are comparing it to the libraries listed below
Sorting:
- Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)☆397Updated last year
- Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.☆319Updated 4 months ago
- Python SDK for running evaluations on LLM generated responses☆293Updated 5 months ago
- Fiddler Auditor is a tool to evaluate language models.☆188Updated last year
- 🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring sa…☆957Updated 11 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAG☆330Updated last year
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆442Updated last year
- 🍰 PromptLayer - Maintain a log of your prompts and OpenAI API requests. Track, debug, and replay old completions.☆691Updated 2 weeks ago
- ☆465Updated last year
- Continuous Integration for LLM powered applications☆254Updated 2 years ago
- 🦜💯 Flex those feathers!☆252Updated last year
- The Rule-based Retrieval package is a Python package that enables you to create and manage Retrieval Augmented Generation (RAG) applicati…☆247Updated last year
- Scale LLM Engine public repository☆814Updated this week
- Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)☆262Updated last year
- data cleaning and curation for unstructured text☆328Updated last year
- ☆774Updated 4 months ago
- ☆186Updated 2 years ago
- Build robust LLM applications with true composability 🔗☆421Updated last year
- Automated Evaluation of RAG Systems☆667Updated 7 months ago
- This repo contains data and code for the paper "Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Da…☆492Updated last year
- An Awesome list of curated DSPy resources.☆469Updated last month
- 🤖🌊 aiFlows: The building blocks of your collaborative AI☆272Updated last year
- Data-Driven Evaluation for LLM-Powered Applications☆509Updated 9 months ago
- Open-Source Implementation of WizardLM to turn documents into Q:A pairs for LLM fine-tuning☆308Updated last year
- ☆506Updated last year
- Task-based Agentic Framework using StrictJSON as the core☆459Updated this week
- An LLM-powered advanced RAG pipeline built from scratch☆855Updated last year
- LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR t…☆495Updated 9 months ago
- Fine-Tuning Embedding for RAG with Synthetic Data☆515Updated 2 years ago
- ☆472Updated last year