arthur-ai / bench
A tool for evaluating LLMs
☆392Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for bench
- Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.☆260Updated this week
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆385Updated 9 months ago
- 🦜💯 Flex those feathers!☆234Updated 3 weeks ago
- Domain Adapted Language Modeling Toolkit - E2E RAG☆311Updated last week
- Python SDK for running evaluations on LLM generated responses☆221Updated this week
- OpenTelemetry Instrumentation for AI Observability☆218Updated this week
- Automatically evaluate your LLMs in Google Colab☆559Updated 6 months ago
- data cleaning and curation for unstructured text☆327Updated 3 months ago
- ☆144Updated 10 months ago
- ☆179Updated last year
- Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)☆390Updated 11 months ago
- Automated Evaluation of RAG Systems☆484Updated 2 weeks ago
- Repository to demonstrate Chain of Table reasoning with multiple tables powered by LangGraph☆146Updated 7 months ago
- Fiddler Auditor is a tool to evaluate language models.☆171Updated 8 months ago
- LangSmith Client SDK Implementations☆419Updated this week
- Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)☆262Updated 8 months ago
- ☆744Updated 10 months ago
- Tutorial for building LLM router☆163Updated 4 months ago
- Fine-Tuning Embedding for RAG with Synthetic Data☆469Updated last year
- Build robust LLM applications with true composability 🔗☆416Updated 10 months ago
- 🍰 PromptLayer - Maintain a log of your prompts and OpenAI API requests. Track, debug, and replay old completions.☆521Updated this week
- The Rule-based Retrieval package is a Python package that enables you to create and manage Retrieval Augmented Generation (RAG) applicati…☆222Updated last month
- ☆433Updated 10 months ago
- Generate textbook-quality synthetic LLM pretraining data☆488Updated last year
- Sample notebooks and prompts for LLM evaluation☆114Updated this week
- An Awesome list of curated DSPy resources.☆226Updated 2 months ago
- LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR t…☆322Updated last month
- ☆267Updated 2 weeks ago
- ☆430Updated 10 months ago