☆336Jul 2, 2024Updated last year
Alternatives and similar repositories for evals
Users that are interested in evals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Hypercorn is an ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.☆14Jan 12, 2026Updated 2 months ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,832Jun 17, 2025Updated 9 months ago
- ☆253Dec 21, 2022Updated 3 years ago
- ☆26Sep 5, 2024Updated last year
- (Model-written) LLM evals library☆18Dec 13, 2024Updated last year
- ☆282Mar 2, 2024Updated 2 years ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆217Mar 16, 2026Updated last week
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆109Oct 25, 2023Updated 2 years ago
- ☆22Sep 9, 2021Updated 4 years ago
- ☆1,073Mar 6, 2024Updated 2 years ago
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆164Updated this week
- A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations☆209Dec 22, 2021Updated 4 years ago
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆140Sep 14, 2022Updated 3 years ago
- Evaluating the Moral Beliefs Encoded in LLMs☆32Dec 17, 2024Updated last year
- METR Task Standard☆178Feb 3, 2025Updated last year
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆103Sep 21, 2023Updated 2 years ago
- ☆94Mar 4, 2024Updated 2 years ago
- Measuring the situational awareness of language models☆40Feb 12, 2024Updated 2 years ago
- A library for mechanistic interpretability of GPT-style language models☆3,223Updated this week
- ☆12Oct 23, 2022Updated 3 years ago
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆16Sep 15, 2023Updated 2 years ago
- Representation Engineering: A Top-Down Approach to AI Transparency☆965Aug 14, 2024Updated last year
- ☆121Jan 19, 2026Updated 2 months ago
- Interactive Composition Explorer: a debugger for compositional language model programs☆567Jan 5, 2026Updated 2 months ago
- ☆27Mar 13, 2024Updated 2 years ago
- Mechanistic Interpretability for Transformer Models☆53Jun 1, 2022Updated 3 years ago
- ☆39Feb 11, 2025Updated last year
- Tools for understanding how transformer predictions are built layer-by-layer☆576Aug 7, 2025Updated 7 months ago
- Used for adaptive human in the loop evaluation of language and embedding models.☆307Mar 1, 2023Updated 3 years ago
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆28May 23, 2024Updated last year
- ☆20Nov 15, 2024Updated last year
- The AI that helps you achieve your goals☆11Feb 4, 2024Updated 2 years ago
- ☆2,550May 19, 2024Updated last year
- ☆30Jun 19, 2023Updated 2 years ago
- ☆558Feb 5, 2024Updated 2 years ago
- Inspect: A framework for large language model evaluations☆1,841Updated this week
- Scripts for generating synthetic finetuning data for reducing sycophancy.☆121Aug 16, 2023Updated 2 years ago
- Erasing concepts from neural representations with provable guarantees☆245Jan 27, 2025Updated last year
- A dataset of alignment research and code to reproduce it☆78Jun 22, 2023Updated 2 years ago