arthur-ai / bench
A tool for evaluating LLMs
☆389Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for bench
- Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.☆257Updated last month
- Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)☆262Updated 7 months ago
- 🦜💯 Flex those feathers!☆234Updated 2 weeks ago
- Python SDK for running evaluations on LLM generated responses☆215Updated this week
- Automated Evaluation of RAG Systems☆479Updated this week
- data cleaning and curation for unstructured text☆327Updated 3 months ago
- Repository to demonstrate Chain of Table reasoning with multiple tables powered by LangGraph☆144Updated 7 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆383Updated 8 months ago
- Fine-Tuning Embedding for RAG with Synthetic Data☆468Updated last year
- Generate textbook-quality synthetic LLM pretraining data☆488Updated last year
- The Rule-based Retrieval package is a Python package that enables you to create and manage Retrieval Augmented Generation (RAG) applicati…☆221Updated last month
- ☆179Updated last year
- Automatically evaluate your LLMs in Google Colab☆556Updated 6 months ago
- Tutorial for building LLM router☆157Updated 3 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAG☆309Updated this week
- An Awesome list of curated DSPy resources.☆223Updated last month
- 🤖🌊 aiFlows: The building blocks of your collaborative AI☆238Updated 6 months ago
- FastAPI wrapper around DSPy☆212Updated 7 months ago
- ☆433Updated 10 months ago
- OpenTelemetry Instrumentation for AI Observability