Phylliida / MambaLens
Mamba support for transformer lens
☆15Updated 6 months ago
Alternatives and similar repositories for MambaLens:
Users that are interested in MambaLens are comparing it to the libraries listed below
- Stick-breaking attention☆49Updated 2 weeks ago
- ☆51Updated 10 months ago
- Sparse Autoencoder Training Library☆47Updated 5 months ago
- Language models scale reliably with over-training and on downstream tasks☆96Updated 11 months ago
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆71Updated 5 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆26Updated 11 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆71Updated 11 months ago
- Stanford NLP Python library for benchmarking the utility of LLM interpretability methods☆62Updated this week
- ☆65Updated last month
- ☆13Updated last year
- ☆74Updated 7 months ago
- ☆60Updated 11 months ago
- Xmixers: A collection of SOTA efficient token/channel mixers☆11Updated 4 months ago
- [NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".☆47Updated 3 weeks ago
- Universal Neurons in GPT2 Language Models☆27Updated 10 months ago
- This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity…☆25Updated last year
- Open source replication of Anthropic's Crosscoders for Model Diffing☆48Updated 5 months ago
- Official implementation of the transformer (TF) architecture suggested in a paper entitled "Looped Transformers as Programmable Computers…☆24Updated last year
- ☆47Updated last year
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆79Updated last year
- ☆30Updated last year
- Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…☆26Updated 10 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆58Updated 2 months ago
- PyTorch and NNsight implementation of AtP* (Kramar et al 2024, DeepMind)☆18Updated 2 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆73Updated 4 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆103Updated 4 months ago
- Test-time-training on nearest neighbors for large language models☆39Updated 11 months ago
- ☆33Updated last month
- Official Code Repository for the paper "Key-value memory in the brain"☆24Updated last month
- Experiments on the impact of depth in transformers and SSMs.☆23Updated 4 months ago