kyegomez / JambaLinks

PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"

☆195

Alternatives and similar repositories for Jamba

Users that are interested in Jamba are comparing it to the libraries listed below

Sorting:

kyegomez / MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…
☆114Updated last month
kyegomez / MambaTransformer
Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling
☆210Updated last month
jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆231Updated last month
NVlabs / hymba
☆200Updated 11 months ago
kyegomez / MambaByte
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
☆124Updated 3 weeks ago
Zyphra / BlackMamba
Code repository for Black Mamba
☆259Updated last year
apple / ml-sigmoid-attention
☆301Updated 6 months ago
AmeenAli / HiddenMambaAttn
Official PyTorch Implementation of "The Hidden Attention of Mamba Models"
☆229Updated last month
kyegomez / Griffin
Implementation of Griffin from the paper: "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
☆56Updated 3 weeks ago
fkodom / yet-another-retnet
A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…
☆106Updated last year
goombalab / hydra
Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"
☆165Updated 9 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated 2 weeks ago
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆109Updated last week
pengzhangzhi / Awesome-Mamba
Awesome list of papers that extend Mamba to various applications.
☆138Updated 5 months ago
hkproj / mamba-notes
Notes on the Mamba and the S4 model (Mamba: Linear-Time Sequence Modeling with Selective State Spaces)
☆173Updated last year
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 7 months ago
AvivBick / awesome-ssm-ml
Reading list for research topics in state-space models
☆330Updated 5 months ago
llm-random / llm-random
☆204Updated last week
goombalab / phi-mamba
Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…
☆116Updated last year
lucidrains / infini-transformer-pytorch
Implementation of Infini-Transformer in Pytorch
☆113Updated 10 months ago
kyegomez / zeta
Build high-performance AI models with modular building blocks
☆564Updated last week
syncdoth / RetNet
Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent,…
☆226Updated last year
test-time-training / ttt-lm-jax
Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States
☆426Updated 2 weeks ago
dingo-actual / infini-transformer
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…
☆292Updated last year
tensorgi / TPA
[NeurIPS 2025 Spotlight] TPA: Tensor ProducT ATTenTion Transformer (T6) (https://arxiv.org/abs/2501.06425)
☆425Updated 3 weeks ago
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆60Updated last year
lucidrains / nGPT-pytorch
Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI
☆292Updated 5 months ago
CG80499 / KAN-GPT-2
Training small GPT-2 style models using Kolmogorov-Arnold networks.
☆121Updated last year
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆216Updated last year
Beomi / InfiniTransformer
Unofficial PyTorch/🤗Transformers(Gemma/Llama3) implementation of Leave No Context Behind: Efficient Infinite Context Transformers with I…
☆373Updated last year