Phylliida / MambaLensLinks

Mamba support for transformer lens

☆17

Alternatives and similar repositories for MambaLens

Users that are interested in MambaLens are comparing it to the libraries listed below

Sorting:

epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆75Updated 8 months ago
shawntan / stickbreaking-attention
Stick-breaking attention
☆58Updated 2 weeks ago
JoshEngels / MultiDimensionalFeatures
Code for reproducing our paper "Not All Language Model Features Are Linear"
☆77Updated 7 months ago
berlino / seq_icl
☆53Updated last year
JacobPfau / fillerTokens
☆66Updated last year
RobertCsordas / moeut
☆82Updated 10 months ago
wesg52 / universal-neurons
Universal Neurons in GPT2 Language Models
☆30Updated last year
Edward-Sun / gpt-accelera
Simple and efficient pytorch-native transformer training and inference (batched)
☆77Updated last year
ApolloResearch / e2e_sae
Sparse Autoencoder Training Library
☆53Updated 2 months ago
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆78Updated last month
ScalingIntelligence / large_language_monkeys
☆97Updated 9 months ago
sustcsonglin / mamba-triton
☆48Updated last year
mlfoundations / scaling
Language models scale reliably with over-training and on downstream tasks
☆97Updated last year
dangxingyu / rnn-icrag
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Updated last year
dayal-kalra / low-memory-adam
☆11Updated 4 months ago
qiuzh20 / gated_attention
The official implementation for Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
☆45Updated 2 months ago
hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆89Updated 7 months ago
emalach / LinearLM
Code for the paper: https://arxiv.org/pdf/2309.06979.pdf
☆19Updated 11 months ago
katiekang1998 / reasoning_generalization
☆33Updated 6 months ago
ckkissane / crosscoder-model-diff-replication
Open source replication of Anthropic's Crosscoders for Model Diffing
☆57Updated 8 months ago
srush / mamba-primer
☆37Updated last year
jopetty / word-problem
Experiments on the impact of depth in transformers and SSMs.
☆32Updated 8 months ago
sail-sg / SkyLadder
The official repository for SkyLadder: Better and Faster Pretraining via Context Window Scheduling
☆33Updated 3 months ago
formll / resolving-scaling-law-discrepancies
☆20Updated last year
insuhan / hyper-attn
☆81Updated last year
saprmarks / geometry-of-truth
☆87Updated 11 months ago
EleutherAI / elk-generalization
Investigating the generalization behavior of LM probes trained to predict truth labels: (1) from one annotator to another, and (2) from e…
☆28Updated last year
EleutherAI / delphi
Delphi was the home of a temple to Phoebus Apollo, which famously had the inscription, 'Know Thyself.' This library lets language models …
☆193Updated this week
tilde-research / nsa-impl
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆90Updated 3 weeks ago
princeton-nlp / Edge-Pruning
[NeurIPS 2024 Spotlight] Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
☆59Updated this week