☆364Jul 2, 2024Updated last year
Alternatives and similar repositories for evals
Users that are interested in evals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Hypercorn is an ASGI and WSGI Server based on Hyper libraries and inspired by Gunicorn.☆15Jan 12, 2026Updated 3 months ago
- Human preference data for "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback"☆1,837Jun 17, 2025Updated 9 months ago
- ☆258Dec 21, 2022Updated 3 years ago
- ☆28Sep 5, 2024Updated last year
- (Model-written) LLM evals library☆18Dec 13, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆283Mar 2, 2024Updated 2 years ago
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆217Apr 6, 2026Updated last week
- ☆14Jan 21, 2025Updated last year
- ☆22Sep 9, 2021Updated 4 years ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆111Oct 25, 2023Updated 2 years ago
- ☆1,073Mar 6, 2024Updated 2 years ago
- ControlArena is a collection of settings, model organisms and protocols - for running control experiments.☆177Updated this week
- A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations☆214Dec 22, 2021Updated 4 years ago
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆145Sep 14, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Evaluating the Moral Beliefs Encoded in LLMs☆34Dec 17, 2024Updated last year
- METR Task Standard☆179Feb 3, 2025Updated last year
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆104Sep 21, 2023Updated 2 years ago
- ☆96Mar 4, 2024Updated 2 years ago
- ☆10Feb 2, 2024Updated 2 years ago
- Measuring the situational awareness of language models☆40Feb 12, 2024Updated 2 years ago
- A library for mechanistic interpretability of GPT-style language models☆3,304Updated this week
- ☆12Oct 23, 2022Updated 3 years ago
- Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy☆49Feb 8, 2026Updated 2 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆16Sep 15, 2023Updated 2 years ago
- Representation Engineering: A Top-Down Approach to AI Transparency☆978Aug 14, 2024Updated last year
- ☆119Jan 19, 2026Updated 2 months ago
- Interactive Composition Explorer: a debugger for compositional language model programs☆566Apr 6, 2026Updated last week
- ☆27Mar 13, 2024Updated 2 years ago
- Mechanistic Interpretability for Transformer Models☆53Jun 1, 2022Updated 3 years ago
- ☆40Feb 11, 2025Updated last year
- Tools for understanding how transformer predictions are built layer-by-layer☆584Aug 7, 2025Updated 8 months ago
- Used for adaptive human in the loop evaluation of language and embedding models.☆307Mar 1, 2023Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆29May 23, 2024Updated last year
- ☆20Nov 15, 2024Updated last year
- ☆14Jul 5, 2024Updated last year
- The AI that helps you achieve your goals☆11Feb 4, 2024Updated 2 years ago
- ☆30Jun 19, 2023Updated 2 years ago
- ☆2,552May 19, 2024Updated last year
- ☆560Feb 5, 2024Updated 2 years ago