☆30May 4, 2023Updated 3 years ago
Alternatives and similar repositories for rep-theory-mech-interp
Users that are interested in rep-theory-mech-interp are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for "Automatic Circuit Finding and Faithfulness"☆17Jul 11, 2024Updated last year
- Subliminal learning in LLMs: language models can transmit hidden preferences through seemingly unrelated training data.☆24Nov 9, 2025Updated 7 months ago
- Notebooks accompanying Anthropic's "Toy Models of Superposition" paper☆153Sep 14, 2022Updated 3 years ago
- Situational Awareness Dataset☆51Dec 14, 2024Updated last year
- ☆12Feb 11, 2026Updated 4 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- DiWA: Diverse Weight Averaging for Out-of-Distribution Generalization☆31Jan 31, 2023Updated 3 years ago
- ☆18Jun 8, 2026Updated last week
- An exploration of LLM steering☆26Jun 15, 2024Updated 2 years ago
- This repository contains the implementation of Label-Free XAI, a new framework to adapt explanation methods to unsupervised models. For m…☆25Sep 21, 2022Updated 3 years ago
- Mechanistic Interpretability for Transformer Models☆53Jun 1, 2022Updated 4 years ago
- ☆289Oct 1, 2024Updated last year
- A library for efficient patching and automatic circuit discovery.☆97Dec 31, 2025Updated 5 months ago
- ☆33Jul 17, 2023Updated 2 years ago
- Rust library for working with data from Wikidata.☆14Jul 10, 2025Updated 11 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Code for "Discovering Non-monotonic Autoregressive Orderings with Variational Inference" (paper and code updated from ICLR 2021)☆12Mar 7, 2024Updated 2 years ago
- (Model-written) LLM evals library☆18Dec 13, 2024Updated last year
- Reimplementation of https://github.com/montemac/algebraic_value_editing in pure PyTorch for efficiency on large models☆11Jun 28, 2023Updated 2 years ago
- gpt completions in vscode☆35Mar 24, 2023Updated 3 years ago
- Random program generator for Python☆10Jun 20, 2013Updated 12 years ago
- Tools for studying developmental interpretability in neural networks.☆140Apr 23, 2026Updated last month
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆33May 1, 2025Updated last year
- ☆35Mar 27, 2025Updated last year
- Official codebase for "Analyzing the Generalization and Reliability of Steering Vectors"☆22Dec 14, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆14Jan 10, 2021Updated 5 years ago
- LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces☆103Sep 21, 2023Updated 2 years ago
- UKBB MRI semantic segmentation for Abdominal Dixon and other modalities☆14Apr 8, 2026Updated 2 months ago
- Visualizing feature measurements on label images in napari☆10Jul 24, 2024Updated last year
- Use Napari Viewer to display CZI images☆11May 25, 2022Updated 4 years ago
- A collection of different ways to implement accessing and modifying internal model activations for LLMs☆24Oct 18, 2024Updated last year
- Ember is a hosted API/SDK that lets you shape AI model behavior by directly controlling a model's internal units of computation, or "feat…☆50Jul 14, 2025Updated 11 months ago
- ☆27Oct 6, 2024Updated last year
- Sparse probing paper full code.☆68Dec 17, 2023Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Mechanistic Interpretability Visualizations using React☆352Apr 30, 2026Updated last month
- Napari plugin for working with video files.☆13Aug 20, 2025Updated 9 months ago
- Extracting minimal DFA's from well-trained RNN's☆11Nov 26, 2018Updated 7 years ago
- Bridge Python and TypeScript with ease. Use simple decorators to expose Python functions, methods, and classes to TypeScript with full ty…☆23Mar 7, 2025Updated last year
- Read MRC format image files into napari using the mrcfile package from CCP-EM☆19May 17, 2024Updated 2 years ago
- Python based viewer for large mulit-dimensional datasets.☆20Jan 28, 2020Updated 6 years ago
- Open Source Replication of Anthropic's Alignment Faking Paper☆58Apr 4, 2025Updated last year