goombalab / phi-mambaLinks

Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models)

☆116

Alternatives and similar repositories for phi-mamba

Users that are interested in phi-mamba are comparing it to the libraries listed below

Sorting:

jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆231Updated last week
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆216Updated last year
RobertCsordas / moeut
☆86Updated last year
shawntan / stickbreaking-attention
Stick-breaking attention
☆61Updated 3 months ago
lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆179Updated 4 months ago
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆108Updated last week
zhixuan-lin / forgetting-transformer
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆131Updated last month
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆233Updated last month
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆129Updated last year
assafbk / DeciMamba
DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)
☆31Updated 6 months ago
goombalab / hydra
Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"
☆161Updated 8 months ago
jxiw / M1
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
☆42Updated 3 months ago
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆131Updated 10 months ago
Cranial-XIX / longhorn
Official PyTorch Implementation of the Longhorn Deep State Space Model
☆55Updated 10 months ago
thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆97Updated 10 months ago
test-time-training / ttt-lm-kernels
Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States
☆73Updated last year
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆113Updated 9 months ago
epfml / schedules-and-scaling
Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"
☆84Updated 11 months ago
OpenSparseLLMs / MoM
☆104Updated last month
JinjieNi / dlms-are-super-data-learners
The official github repo for "Diffusion Language Models are Super Data Learners".
☆135Updated 3 weeks ago
huyphan168 / PEER
Mixture of A Million Experts
☆48Updated last year
apple / ml-sigmoid-attention
☆302Updated 6 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆192Updated 11 months ago
ScalingIntelligence / large_language_monkeys
☆107Updated last year
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆241Updated 4 months ago
locuslab / massive-activations
Code accompanying the paper "Massive Activations in Large Language Models"
☆184Updated last year
TianjinYellow / SPAM-Optimizer
☆34Updated 7 months ago
wmn-231314 / diffusion-data-constraint
Official PyTorch implementation and models for paper "Diffusion Beats Autoregressive in Data-Constrained Settings". We find diffusion mod…
☆101Updated last month
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆89Updated last year
llm-random / llm-random
☆200Updated last month