arthur-ai / bench
A tool for evaluating LLMs
☆418Updated 11 months ago
Alternatives and similar repositories for bench:
Users that are interested in bench are comparing it to the libraries listed below
- ☆767Updated last year
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆421Updated last year
- Domain Adapted Language Modeling Toolkit - E2E RAG☆320Updated 5 months ago
- Fiddler Auditor is a tool to evaluate language models.☆179Updated last year
- Python SDK for running evaluations on LLM generated responses☆278Updated last week
- Automated Evaluation of RAG Systems☆582Updated last month
- Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.☆297Updated 5 months ago
- 🍰 PromptLayer - Maintain a log of your prompts and OpenAI API requests. Track, debug, and replay old completions.☆582Updated this week
- HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels☆530Updated 4 months ago
- data cleaning and curation for unstructured text☆329Updated 9 months ago
- Fine-Tuning Embedding for RAG with Synthetic Data☆494Updated last year
- LLM Prompt Injection Detector☆1,264Updated 8 months ago
- ☆185Updated last year
- 🔍 LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). 📚 Extracts signals from prompts & responses, ensuring sa…☆907Updated 5 months ago
- ☆195Updated last year
- Data-Driven Evaluation for LLM-Powered Applications☆489Updated 3 months ago
- The Rule-based Retrieval package is a Python package that enables you to create and manage Retrieval Augmented Generation (RAG) applicati…☆237Updated 7 months ago
- Get 100% uptime, reliability from OpenAI. Handle Rate Limit, Timeout, API, Keys Errors☆653Updated last year
- ☆163Updated last year
- Promptimize is a prompt engineering evaluation and testing toolkit.☆460Updated last month
- 🦜💯 Flex those feathers!☆245Updated 6 months ago
- Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)☆263Updated last year
- 📚 Datasets and models for instruction-tuning☆238Updated last year
- ⛓️ build cognitive systems, pythonic☆336Updated 5 months ago
- Create repos and commits with AI.☆293Updated last year
- An Awesome list of curated DSPy resources.☆311Updated 2 months ago
- Open-Source Implementation of WizardLM to turn documents into Q:A pairs for LLM fine-tuning☆305Updated 6 months ago
- ☆259Updated last year
- Super performant RAG pipelines for AI apps. Summarization, Retrieve/Rerank and Code Interpreters in one simple API.☆369Updated last year
- Generate textbook-quality synthetic LLM pretraining data☆498Updated last year