arthur-ai / bench
A tool for evaluating LLMs
β400Updated 9 months ago
Alternatives and similar repositories for bench:
Users that are interested in bench are comparing it to the libraries listed below
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.β406Updated last year
- Automated Evaluation of RAG Systemsβ547Updated 3 months ago
- π° PromptLayer - Maintain a log of your prompts and OpenAI API requests. Track, debug, and replay old completions.β557Updated this week
- Fiddler Auditor is a tool to evaluate language models.β175Updated 11 months ago
- Python SDK for running evaluations on LLM generated responsesβ266Updated this week
- data cleaning and curation for unstructured textβ329Updated 6 months ago
- Domain Adapted Language Modeling Toolkit - E2E RAGβ313Updated 3 months ago
- π¦π― Flex those feathers!β239Updated 3 months ago
- β446Updated last year
- Fine-Tuning Embedding for RAG with Synthetic Dataβ483Updated last year
- Generate textbook-quality synthetic LLM pretraining dataβ495Updated last year
- Data-Driven Evaluation for LLM-Powered Applicationsβ476Updated 3 weeks ago
- Repository to demonstrate Chain of Table reasoning with multiple tables powered by LangGraphβ145Updated 10 months ago
- Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) applications.β283Updated 3 months ago
- Automatically evaluate your LLMs in Google Colabβ590Updated 9 months ago
- β157Updated last year
- Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)β389Updated last year
- βοΈ build cognitive systems, pythonicβ331Updated 3 months ago
- β756Updated last year
- SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Modelsβ492Updated 7 months ago
- Tuning and Evaluation of RAG pipeline. (Automated optimization to be added soon)β262Updated 11 months ago
- OpenTelemetry Instrumentation for AI Observabilityβ297Updated this week
- Directly Connecting Python to LLMs via Strongly-Typed Functions, Dataclasses, Interfaces & Generic Typesβ393Updated last month
- π Datasets and models for instruction-tuningβ234Updated last year
- AgentSearch is a framework for powering search agents and enabling customizable local search.β470Updated 9 months ago
- A framework for event based autonomous multi-agent systems.β300Updated 5 months ago
- Fast & more realistic evaluation of chat language models. Includes leaderboard.β183Updated last year
- Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs"β462Updated 11 months ago
- The Rule-based Retrieval package is a Python package that enables you to create and manage Retrieval Augmented Generation (RAG) applicatiβ¦β236Updated 4 months ago
- β448Updated last year