arthur-ai / bench
A tool for evaluating LLMs
β397Updated 8 months ago
Alternatives and similar repositories for bench:
Users that are interested in bench are comparing it to the libraries listed below
- Python SDK for running evaluations on LLM generated responsesβ253Updated last week
- Fiddler Auditor is a tool to evaluate language models.β174Updated 10 months ago
- π¦π― Flex those feathers!β236Updated 2 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAGβ313Updated 2 months ago
- Automated Evaluation of RAG Systemsβ526Updated 2 months ago
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.β401Updated 11 months ago
- β754Updated last year
- Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.β276Updated 2 months ago
- data cleaning and curation for unstructured textβ328Updated 5 months ago
- A joint community effort to create one central leaderboard for LLMs.β288Updated 4 months ago
- A framework for event based autonomous multi-agent systems.β299Updated 4 months ago
- An Awesome list of curated DSPy resources.β262Updated 4 months ago
- β184Updated last year
- π° PromptLayer - Maintain a log of your prompts and OpenAI API requests. Track, debug, and replay old completions.β540Updated this week
- π LangKit: An open-source toolkit for monitoring Large Language Models (LLMs). π Extracts signals from prompts & responses, ensuring saβ¦β866Updated last month
- β154Updated last year
- Fine-Tuning Embedding for RAG with Synthetic Dataβ477Updated last year
- β440Updated last year
- Tutorial for building LLM routerβ170Updated 5 months ago
- OpenTelemetry Instrumentation for AI Observabilityβ254Updated this week
- Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)β262Updated 9 months ago
- βοΈ build cognitive systems, pythonicβ328Updated last month
- Sample notebooks and prompts for LLM evaluationβ119Updated last month
- The Rule-based Retrieval package is a Python package that enables you to create and manage Retrieval Augmented Generation (RAG) applicatiβ¦β233Updated 3 months ago
- This repo contains data and code for the paper "Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Daβ¦β484Updated 9 months ago
- LangSmith Client SDK Implementationsβ457Updated this week
- Data-Driven Evaluation for LLM-Powered Applicationsβ463Updated last week
- Create repos and commits with AI.β293Updated last year
- Fast & more realistic evaluation of chat language models. Includes leaderboard.β183Updated last year