Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
☆125Mar 20, 2026Updated 3 weeks ago
Alternatives and similar repositories for MambaByte
Users that are interested in MambaByte are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆125Mar 22, 2026Updated 2 weeks ago
- [CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Model☆25Oct 12, 2024Updated last year
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆20Mar 28, 2026Updated 2 weeks ago
- Implementation of the Mamba SSM with hf_integration.☆55Aug 31, 2024Updated last year
- [ICLR'25] "Understanding Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing" by Peihao Wang, Ruisi Cai, Yue…☆17Mar 21, 2025Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆25Oct 13, 2025Updated 5 months ago
- Implementation of a Hierarchical Mamba as described in the paper: "Hierarchical State Space Models for Continuous Sequence-to-Sequence Mo…☆15Nov 11, 2024Updated last year
- Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …☆16Sep 18, 2025Updated 6 months ago
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆218Mar 22, 2026Updated 2 weeks ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated 2 years ago
- Code for the paper: https://arxiv.org/pdf/2309.06979.pdf☆21Jul 29, 2024Updated last year
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆67Apr 24, 2024Updated last year
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch☆654Dec 27, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Some preliminary explorations of Mamba's context scaling.☆219Feb 8, 2024Updated 2 years ago
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆209Mar 13, 2026Updated 3 weeks ago
- Modified Mamba code to run on CPU☆30Jan 14, 2024Updated 2 years ago
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆55Mar 25, 2025Updated last year
- A repository for DenseSSMs☆90Apr 11, 2024Updated 2 years ago
- A sophisticated multi-agent system designed for real-time market analysis of HTX (formerly Huobi) exchange data. This swarm combines spec…☆10Mar 18, 2025Updated last year
- A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series…☆29Nov 11, 2024Updated last year
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆40Nov 11, 2024Updated last year
- Build high-performance AI models with modular building blocks☆580Feb 9, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆232Oct 16, 2025Updated 5 months ago
- Awesome list of papers that extend Mamba to various applications.☆139Jun 10, 2025Updated 10 months ago
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated last year
- ☆35Nov 22, 2024Updated last year
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…☆21Mar 15, 2025Updated last year
- A simple implementation of [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)☆22Jan 22, 2024Updated 2 years ago
- ☆12Apr 17, 2024Updated last year
- OmniByteFormer is a generalized Transformer model that can process any type of data by converting it into byte sequences, bypassing tradi…☆15Updated this week
- Pytorch (Lightning) implementation of the Mamba model☆37Apr 18, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A simple and efficient Mamba implementation in pure PyTorch and MLX.☆1,450Jan 26, 2026Updated 2 months ago
- Minimal Mamba-2 implementation in PyTorch☆247Jun 17, 2024Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆34Aug 14, 2024Updated last year
- Mamba-Chat: A chat LLM based on the state-space model architecture 🐍☆940Mar 3, 2024Updated 2 years ago
- Simple, minimal implementation of the Mamba SSM in one file of PyTorch.☆2,940Mar 8, 2024Updated 2 years ago
- Here we will test various linear attention designs.☆62Apr 25, 2024Updated last year
- ☆19Dec 24, 2024Updated last year