etredal / openCLTLinks
☆26Updated 2 months ago
Alternatives and similar repositories for openCLT
Users that are interested in openCLT are comparing it to the libraries listed below
Sorting:
- Using sparse coding to find distributed representations used by neural networks.☆261Updated last year
- ☆507Updated last year
- Training Sparse Autoencoders on Language Models☆910Updated this week
- ☆326Updated 3 weeks ago
- ☆157Updated 8 months ago
- This repository collects all relevant resources about interpretability in LLMs☆368Updated 9 months ago
- ☆180Updated 8 months ago
- ☆183Updated 3 weeks ago
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆263Updated 4 months ago
- Sparse Autoencoder for Mechanistic Interpretability☆258Updated last year
- A resource repository for representation engineering in large language models☆129Updated 8 months ago
- Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities. arXiv:2408.07666.☆497Updated this week
- The one-stop repository for large language model (LLM) unlearning. Supports TOFU, MUSE, WMDP, and many unlearning methods. All features: …☆341Updated 3 weeks ago
- awesome SAE papers☆41Updated 2 months ago
- Sparsify transformers with SAEs and transcoders☆598Updated last week
- FeatureAlignment = Alignment + Mechanistic Interpretability☆29Updated 5 months ago
- Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep☆144Updated 3 months ago
- ☆109Updated 3 weeks ago
- Code for paper: Aligning Large Language Models with Representation Editing: A Control Perspective☆32Updated 6 months ago
- ☆51Updated 8 months ago
- A curated list of resources for activation engineering☆99Updated 2 months ago
- Codebase for reproducing the experiments of the semantic uncertainty paper (short-phrase and sentence-length experiments).☆352Updated last year
- A resource repository for machine unlearning in large language models☆454Updated 3 weeks ago
- ☆235Updated last year
- This repository contains a regularly updated paper list for LLMs-reasoning-in-latent-space.☆149Updated this week
- Layer-wise Relevance Propagation for Large Language Models and Vision Transformers [ICML 2024]☆177Updated last month
- For OpenMOSS Mechanistic Interpretability Team's Sparse Autoencoder (SAE) research.☆142Updated this week
- This paper list focuses on the theoretical and empirical analysis of language models, especially large language models (LLMs). The papers…☆85Updated 8 months ago
- An Open Source Implementation of Anthropic's Paper: "Towards Monosemanticity: Decomposing Language Models with Dictionary Learning"☆48Updated last year
- Welcome to the 'In Context Learning Theory' Reading Group☆29Updated 9 months ago