arthur-ai / benchLinks
A tool for evaluating LLMs
β428Updated last year
Alternatives and similar repositories for bench
Users that are interested in bench are comparing it to the libraries listed below
Sorting:
- Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)β398Updated 2 years ago
- π LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). π Extracts signals from prompts & responses, ensuring saβ¦β971Updated last year
- Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.β321Updated 5 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAGβ334Updated last year
- β469Updated 2 years ago
- Fiddler Auditor is a tool to evaluate language models.β188Updated last year
- β779Updated 6 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.β446Updated last year
- data cleaning and curation for unstructured textβ328Updated last year
- π¦π― Flex those feathers!β255Updated last year
- Python SDK for running evaluations on LLM generated responsesβ294Updated 6 months ago
- Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)β263Updated last year
- Fine-Tuning Embedding for RAG with Synthetic Dataβ524Updated 2 years ago
- π° PromptLayer - Maintain a log of your prompts and OpenAI API requests. Track, debug, and replay old completions.β717Updated this week
- Scale LLM Engine public repositoryβ818Updated last week
- The Rule-based Retrieval package is a Python package that enables you to create and manage Retrieval Augmented Generation (RAG) applicatiβ¦β247Updated last year
- β187Updated 2 years ago
- βοΈ build cognitive systems, pythonicβ340Updated last year
- Build robust LLM applications with true composability πβ422Updated last year
- Continuous Integration for LLM powered applicationsβ254Updated 2 years ago
- Automated Evaluation of RAG Systemsβ681Updated 9 months ago
- Automatically evaluate your LLMs in Google Colabβ677Updated last year
- This repo contains data and code for the paper "Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Daβ¦β495Updated last year
- Guide for fine-tuning Llama/Mistral/CodeLlama models and moreβ641Updated 2 months ago
- Open-Source Implementation of WizardLM to turn documents into Q:A pairs for LLM fine-tuningβ309Updated last year
- An LLM-powered advanced RAG pipeline built from scratchβ855Updated last year
- β507Updated last year
- β198Updated last year
- β474Updated last year
- Retrieval Augmented Generation (RAG) framework and context engine powered by Pineconeβ1,029Updated last year