jzhang38 / LongMambaLinks

Some preliminary explorations of Mamba's context scaling.

☆217

Alternatives and similar repositories for LongMamba

Users that are interested in LongMamba are comparing it to the libraries listed below

Sorting:

jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆231Updated last month
goombalab / phi-mamba
Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…
☆116Updated last year
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆243Updated 5 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆240Updated 2 months ago
lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆180Updated 5 months ago
epfml / dynamic-sparse-flash-attention
☆150Updated 2 years ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆194Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated last year
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year
llm-random / llm-random
☆205Updated this week
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated last month
zhixuan-lin / forgetting-transformer
[ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning
☆134Updated 3 weeks ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 10 months ago
thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆99Updated 11 months ago
shawntan / stickbreaking-attention
Stick-breaking attention
☆61Updated 5 months ago
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 7 months ago
RobertCsordas / moeut
☆89Updated last year
athms / mad-lab
A MAD laboratory to improve AI architecture designs 🧪
☆135Updated 11 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆174Updated 5 months ago
sustcsonglin / linear-attention-and-beyond-slides
☆99Updated 9 months ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆212Updated 5 months ago
ScalingIntelligence / large_language_monkeys
☆109Updated last year
apple / ml-sigmoid-attention
☆303Updated 7 months ago
HanGuo97 / lq-lora
☆128Updated last year
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆311Updated 2 weeks ago
huyphan168 / PEER
Mixture of A Million Experts
☆50Updated last year
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆74Updated 9 months ago
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆379Updated 2 months ago
HanGuo97 / log-linear-attention
☆256Updated 5 months ago
berlino / gated_linear_attention
☆106Updated last year