arthur-ai / benchLinks
A tool for evaluating LLMs
☆419Updated last year
Alternatives and similar repositories for bench
Users that are interested in bench are comparing it to the libraries listed below
Sorting:
- Fiddler Auditor is a tool to evaluate language models.☆180Updated last year
- Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.☆300Updated last week
- 🦜💯 Flex those feathers!☆246Updated 7 months ago
- Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)☆263Updated last year
- data cleaning and curation for unstructured text☆327Updated 9 months ago
- ☆451Updated last year
- Python SDK for running evaluations on LLM generated responses☆280Updated last week
- Automated Evaluation of RAG Systems☆596Updated 2 months ago
- 🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring sa…☆913Updated 6 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAG☆321Updated 6 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆422Updated last year
- Repository to demonstrate Chain of Table reasoning with multiple tables powered by LangGraph☆144Updated last year
- ☆163Updated last year
- Fine-Tuning Embedding for RAG with Synthetic Data☆497Updated last year
- Directly Connecting Python to LLMs via Strongly-Typed Functions, Dataclasses, Interfaces & Generic Types☆398Updated 2 months ago
- Data-Driven Evaluation for LLM-Powered Applications☆493Updated 4 months ago
- 🍰 PromptLayer - Maintain a log of your prompts and OpenAI API requests. Track, debug, and replay old completions.☆609Updated last week
- The Rule-based Retrieval package is a Python package that enables you to create and manage Retrieval Augmented Generation (RAG) applicati…☆239Updated 7 months ago
- ☆764Updated last year
- Create repos and commits with AI.☆296Updated last year
- Open-Source Implementation of WizardLM to turn documents into Q:A pairs for LLM fine-tuning☆306Updated 7 months ago
- Generate textbook-quality synthetic LLM pretraining data☆497Updated last year
- A joint community effort to create one central leaderboard for LLMs.☆298Updated 9 months ago
- A tiny library for coding with large language models.☆1,232Updated 10 months ago
- Tutorial for building LLM router☆206Updated 10 months ago
- ☆194Updated last year
- This repo contains data and code for the paper "Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Da…☆487Updated last year
- ☆185Updated last year
- Prompt programming with FMs.☆442Updated 10 months ago
- Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)☆395Updated last year