Machine Learning for Alignment Bootcamp
☆82Apr 27, 2022Updated 3 years ago
Alternatives and similar repositories for mlab
Users that are interested in mlab are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Machine Learning for Alignment Bootcamp (MLAB).☆33Jan 24, 2022Updated 4 years ago
- Resources for skilling up in AI alignment research engineering. Covers basics of deep learning, mechanistic interpretability, and RL.☆241Aug 11, 2025Updated 7 months ago
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆20Oct 18, 2024Updated last year
- A curated list of awesome resources for Artificial Intelligence Alignment research☆81Jul 14, 2023Updated 2 years ago
- (Model-written) LLM evals library☆18Dec 13, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Keeping language models honest by directly eliciting knowledge encoded in their activations.☆217Updated this week
- ☆66Feb 16, 2023Updated 3 years ago
- The Happy Faces Benchmark☆15Jul 20, 2023Updated 2 years ago
- Improving Steering Vectors by Targeting Sparse Autoencoder Features☆27Nov 20, 2024Updated last year
- Language model alignment-focused deep learning curriculum☆1,556Aug 19, 2024Updated last year
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆20Jan 19, 2025Updated last year
- James' cookbook of evaluations and finetuning experiments☆23Feb 19, 2026Updated last month
- we got you bro☆38Jul 29, 2024Updated last year
- Repository with sample code using Apollo's suggested engineering practices☆15Dec 16, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Mechanistic Interpretability Visualizations using React☆332Dec 18, 2024Updated last year
- A library for bridging Python and HTML/Javascript (via Svelte) for creating interactive visualizations☆211Dec 22, 2021Updated 4 years ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆103Sep 21, 2023Updated 2 years ago
- Contains random samples referenced in the paper "Sleeper Agents: Training Robustly Deceptive LLMs that Persist Through Safety Training".☆139Mar 9, 2024Updated 2 years ago
- Implementation of Influence Function approximations for differently sized ML models, using PyTorch☆16Sep 15, 2023Updated 2 years ago
- ☆1,010Updated this week
- ☆13Jul 12, 2024Updated last year
- Code for Preventing Language Models From Hiding Their Reasoning, which evaluates defenses against LLM steganography.☆25Jan 26, 2024Updated 2 years ago
- ☆25Nov 11, 2025Updated 4 months ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Real News Headlines + Fake Financial Predictions = St0nks☆24May 22, 2023Updated 2 years ago
- model.yaml is an open standard for defining crossplatform, composable AI models☆55Sep 9, 2025Updated 6 months ago
- Improved version of the technical workshops for the 10-day ML4G camp on safety of AI systems☆19Mar 7, 2026Updated 3 weeks ago
- ☆278Oct 1, 2024Updated last year
- Inspect: A framework for large language model evaluations☆1,851Updated this week
- Tools for understanding how transformer predictions are built layer-by-layer☆576Aug 7, 2025Updated 7 months ago
- Inference API for many LLMs and other useful tools for empirical research☆112Updated this week
- ☆60Mar 8, 2022Updated 4 years ago
- Official Code for our paper: "Language Models Learn to Mislead Humans via RLHF""☆19Oct 11, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- The AI that helps you achieve your goals☆11Feb 4, 2024Updated 2 years ago
- METR Task Standard☆178Feb 3, 2025Updated last year
- Import scripts for existing mood tracking app data☆13Dec 8, 2022Updated 3 years ago
- A library for mechanistic interpretability of GPT-style language models☆3,223Mar 22, 2026Updated last week
- Tools for studying developmental interpretability in neural networks.☆128Dec 30, 2025Updated 2 months ago
- Using Python, GPT4 and LangChain to Generate Custom Anki Decks☆32Mar 26, 2024Updated 2 years ago
- Archive of questions from the Cambridge Mathematics Tripos☆10Jun 6, 2022Updated 3 years ago