arthur-ai / benchLinks
A tool for evaluating LLMs
☆428Updated last year
Alternatives and similar repositories for bench
Users that are interested in bench are comparing it to the libraries listed below
Sorting:
- 🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring sa…☆967Updated last year
- Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)☆398Updated 2 years ago
- Domain Adapted Language Modeling Toolkit - E2E RAG☆334Updated last year
- Scale LLM Engine public repository☆818Updated this week
- Python SDK for running evaluations on LLM generated responses☆293Updated 6 months ago
- Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.☆320Updated 5 months ago
- Fiddler Auditor is a tool to evaluate language models.☆188Updated last year
- Automatically evaluate your LLMs in Google Colab☆675Updated last year
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆444Updated last year
- ☆468Updated last year
- ⛓️ build cognitive systems, pythonic☆340Updated last year
- 🦜💯 Flex those feathers!☆255Updated last year
- data cleaning and curation for unstructured text☆328Updated last year
- 🤖🌊 aiFlows: The building blocks of your collaborative AI☆272Updated last year
- ☆778Updated 5 months ago
- Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.☆865Updated last year
- wandbot is a technical support bot for Weights & Biases' AI developer tools that can run in Discord, Slack, ChatGPT and Zendesk☆309Updated last month
- 🍰 PromptLayer - Maintain a log of your prompts and OpenAI API requests. Track, debug, and replay old completions.☆711Updated 2 weeks ago
- Build robust LLM applications with true composability 🔗☆422Updated last year
- Fine-Tuning Embedding for RAG with Synthetic Data☆520Updated 2 years ago
- Generate textbook-quality synthetic LLM pretraining data☆507Updated 2 years ago
- Continuous Integration for LLM powered applications☆254Updated 2 years ago
- Automated Evaluation of RAG Systems☆678Updated 8 months ago
- Open-Source Implementation of WizardLM to turn documents into Q:A pairs for LLM fine-tuning☆308Updated last year
- Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)☆263Updated last year
- Data-Driven Evaluation for LLM-Powered Applications☆515Updated 10 months ago
- A joint community effort to create one central leaderboard for LLMs.☆308Updated last year
- Lightweight chat AI platform featuring custom knowledge, open-source LLMs, prompt-engineering, retrieval analysis. Highly customizable. F…☆218Updated last year
- Guide for fine-tuning Llama/Mistral/CodeLlama models and more☆640Updated 2 months ago
- ☆186Updated 2 years ago