UKGovernmentBEIS / inspect_ai
Inspect: A framework for large language model evaluations
☆812Updated this week
Alternatives and similar repositories for inspect_ai:
Users that are interested in inspect_ai are comparing it to the libraries listed below
- A library for making RepE control vectors☆557Updated 2 months ago
- METR Task Standard☆143Updated last month
- Automatically evaluate your LLMs in Google Colab☆602Updated 10 months ago
- Evaluate your LLM's response with Prometheus and GPT4 💯☆879Updated 2 months ago
- A benchmark to evaluate language models on questions I've previously asked them to solve.☆978Updated last month
- Collection of evals for Inspect AI☆88Updated this week
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.☆2,129Updated this week
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,279Updated this week
- Training Sparse Autoencoders on Language Models☆649Updated this week
- Data-Driven Evaluation for LLM-Powered Applications☆480Updated last month
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,049Updated this week
- LLM Comparator is an interactive data visualization tool for evaluating and analyzing LLM responses side-by-side, developed by the PAIR t…☆390Updated last month
- Automated Evaluation of RAG Systems☆560Updated 4 months ago
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,327Updated 3 weeks ago
- Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard a…☆1,068Updated 2 months ago
- A tool for evaluating LLMs☆406Updated 10 months ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,291Updated last week
- utilities for decoding deep representations (like sentence embeddings) back to text☆773Updated last month
- DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤☆979Updated last month
- Code and Data for Tau-Bench☆326Updated last month
- End-to-end Generative Optimization for AI Agents☆511Updated this week
- AIDE: AI-Driven Exploration in the Space of Code. State of the Art machine Learning engineering agents that automates AI R&D.☆789Updated 2 weeks ago
- Weave is a toolkit for developing AI-powered applications, built by Weights & Biases.☆839Updated this week
- Extract full next-token probabilities via language model APIs☆231Updated last year
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,546Updated this week
- Vivaria is METR's tool for running evaluations and conducting agent elicitation research.☆80Updated this week
- The nnsight package enables interpreting and manipulating the internals of deep learned models.☆515Updated this week
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆410Updated last year
- A framework-less approach to robust agent development.☆156Updated last week