jxiw/MambaInLlama

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jxiw/MambaInLlama)

jxiw / MambaInLlama

[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models

☆243

Alternatives and similar repositories for MambaInLlama

Users that are interested in MambaInLlama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

itsdaniele / speculative_mamba
View on GitHub
☆18Nov 28, 2024Updated last year
goombalab / phi-mamba
View on GitHub
Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…
☆125Sep 13, 2024Updated last year
LPD-EPFL / MVTIL
View on GitHub
Distributed transactions
☆13Sep 19, 2019Updated 6 years ago
nanowell / Q-Sparse-LLM
View on GitHub
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆37Aug 14, 2024Updated last year
jxiw / M1
View on GitHub
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
☆48Jul 17, 2025Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
IBM / selective-dense-state-space-model
View on GitHub
Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …
☆16Sep 18, 2025Updated 10 months ago
emalach / LinearLM
View on GitHub
Code for the paper: https://arxiv.org/pdf/2309.06979.pdf
☆21Jul 29, 2024Updated last year
OpenSparseLLMs / Linearization
View on GitHub
☆71Jul 8, 2025Updated last year
jxiw / BiGS
View on GitHub
Official Repository of Pretraining Without Attention (BiGS), BiGS is the first model to achieve BERT-level transfer learning on the GLUE …
☆119Mar 16, 2024Updated 2 years ago
assafbk / DeciMamba
View on GitHub
DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)
☆32Apr 9, 2025Updated last year
mit-han-lab / duo-attention
View on GitHub
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆539Feb 10, 2025Updated last year
HazyResearch / lolcats
View on GitHub
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆260Jan 31, 2025Updated last year
HazyResearch / based
View on GitHub
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆256Jun 6, 2025Updated last year
RL10x / RetNet
View on GitHub
an implementation of paper"Retentive Network: A Successor to Transformer for Large Language Models" https://arxiv.org/pdf/2307.08621.pdf
☆11Jul 25, 2023Updated 3 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
sjelassi / transformers_ssm_copy
View on GitHub
☆40Feb 26, 2024Updated 2 years ago
OpenNLPLab / HGRN2
View on GitHub
HGRN2: Gated Linear RNNs with State Expansion
☆58Aug 20, 2024Updated last year
itsnamgyu / block-transformer
View on GitHub
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆166Apr 13, 2025Updated last year
Doraemonzzz / xmixers
View on GitHub
Xmixers: A collection of SOTA efficient token/channel mixers
☆29Sep 4, 2025Updated 10 months ago
sustcsonglin / mamba-triton
View on GitHub
☆52Jan 28, 2024Updated 2 years ago
Doraemonzzz / hgru-pytorch
View on GitHub
☆29Jul 9, 2024Updated 2 years ago
BBuf / flash-rwkv
View on GitHub
☆32May 26, 2024Updated 2 years ago
JunwenBai / DAPC
View on GitHub
Deep Autoencoding Predictive Components
☆10Mar 4, 2021Updated 5 years ago
orionw / promptriever
View on GitHub
The first dense retrieval model that can be prompted like an LM
☆93May 8, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
FreedomIntelligence / LongLLaVA
View on GitHub
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture
☆211Jan 6, 2025Updated last year
GATECH-EIC / Linearized-LLM
View on GitHub
[ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models
☆35Jun 12, 2024Updated 2 years ago
Zyphra / Zamba2
View on GitHub
PyTorch implementation of models from the Zamba2 series.
☆193Jan 23, 2025Updated last year
AvivBick / awesome-ssm-ml
View on GitHub
Reading list for research topics in state-space models
☆367May 18, 2026Updated 2 months ago
HanGuo97 / log-linear-attention
View on GitHub
☆284Jun 6, 2025Updated last year
dangxingyu / rnn-icrag
View on GitHub
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Apr 17, 2024Updated 2 years ago
microsoft / Samba
View on GitHub
[ICLR 2025] Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling
☆966Nov 16, 2025Updated 8 months ago
whyNLP / LCKV
View on GitHub
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆157Apr 7, 2025Updated last year
Cranial-XIX / longhorn
View on GitHub
Official PyTorch Implementation of the Longhorn Deep State Space Model
☆57Dec 4, 2024Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
goombalab / hydra
View on GitHub
Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"
☆175Jan 30, 2025Updated last year
shreyansh26 / Attention-Mask-Patterns
View on GitHub
Using FlexAttention to compute attention with different masking patterns
☆47Sep 22, 2024Updated last year
abdelfattah-lab / attamba
View on GitHub
☆13Nov 29, 2024Updated last year
NVlabs / Minitron
View on GitHub
A family of compressed models obtained via pruning and knowledge distillation
☆384Nov 6, 2025Updated 8 months ago
00ffcc / chunkRWKV6
View on GitHub
continous batching and parallel acceleration for RWKV6
☆23Jun 28, 2024Updated 2 years ago
OpenMOSE / RWKV-Infer
View on GitHub
A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…
☆51Oct 21, 2025Updated 9 months ago
fla-org / flash-linear-attention
View on GitHub
🚀 Efficient implementations for emerging model architectures
☆5,414Updated this week