MadryLab / modelcomponents
Decomposing and Editing Predictions by Modeling Model Computation
☆138Updated 8 months ago
Alternatives and similar repositories for modelcomponents:
Users that are interested in modelcomponents are comparing it to the libraries listed below
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆212Updated 8 months ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆186Updated 3 weeks ago
- ☆166Updated last year
- A curated reading list of research in Adaptive Computation, Inference-Time Computation & Mixture of Experts (MoE).☆139Updated last month
- Code accompanying the paper "Massive Activations in Large Language Models"☆140Updated 11 months ago
- Reading list for research topics in state-space models☆260Updated last month
- A curated list of Model Merging methods.☆89Updated 5 months ago
- Optimal Transport in the Big Data Era☆103Updated 3 months ago
- Towards Understanding the Mixture-of-Experts Layer in Deep Learning☆22Updated last year
- Interpretable text embeddings by asking LLMs yes/no questions (NeurIPS 2024)☆35Updated 3 months ago
- Official code for the ICML 2024 paper "The Entropy Enigma: Success and Failure of Entropy Minimization"☆48Updated 8 months ago
- Auto Interpretation Pipeline and many other functionalities for Multimodal SAE Analysis.☆104Updated 3 weeks ago
- ☆135Updated 8 months ago
- ☆71Updated 6 months ago
- Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"☆51Updated 3 weeks ago
- [NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"☆109Updated 11 months ago
- ☆243Updated last week
- Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning☆44Updated this week
- Using sparse coding to find distributed representations used by neural networks.☆213Updated last year
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆156Updated last month
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆196Updated 3 weeks ago
- Awesome list of papers that extend Mamba to various applications.☆131Updated 2 months ago
- LLM-Merging: Building LLMs Efficiently through Merging☆190Updated 4 months ago
- [NeurIPS2024] Twin-Merging: Dynamic Integration of Modular Expertise in Model Merging☆48Updated 2 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆185Updated 8 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆68Updated 2 months ago
- ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).☆205Updated this week
- A brief and partial summary of RLHF algorithms.☆93Updated 2 months ago
- ☆181Updated this week
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆96Updated 5 months ago