Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
☆127May 11, 2026Updated last month
Alternatives and similar repositories for MambaByte
Users that are interested in MambaByte are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…☆130May 12, 2026Updated 3 weeks ago
- Implementation of MambaFormer in Pytorch ++ Zeta from the paper: "Can Mamba Learn How to Learn? A Comparative Study on In-Context Learnin…☆21May 12, 2026Updated 3 weeks ago
- Implementation of the Mamba SSM with hf_integration.☆55Aug 31, 2024Updated last year
- Mamba R1 represents a novel architecture that combines the efficiency of Mamba's state space models with the scalability of Mixture of Ex…☆24Oct 13, 2025Updated 7 months ago
- Implementation of a Hierarchical Mamba as described in the paper: "Hierarchical State Space Models for Continuous Sequence-to-Sequence Mo…☆15Nov 11, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …☆16Sep 18, 2025Updated 8 months ago
- [ICLR'25] "Understanding Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing" by Peihao Wang, Ruisi Cai, Yue…☆18Mar 21, 2025Updated last year
- Efficient PScan implementation in PyTorch☆17Jan 2, 2024Updated 2 years ago
- Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling☆224May 11, 2026Updated 3 weeks ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆18Mar 15, 2024Updated 2 years ago
- A byte-level decoder architecture that matches the performance of tokenized Transformers.☆68Apr 24, 2024Updated 2 years ago
- Implementation of MEGABYTE, Predicting Million-byte Sequences with Multiscale Transformers, in Pytorch☆654Dec 27, 2024Updated last year
- Some preliminary explorations of Mamba's context scaling.☆219Feb 8, 2024Updated 2 years ago
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆216May 11, 2026Updated last month
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Modified Mamba code to run on CPU☆32Jan 14, 2024Updated 2 years ago
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆55Mar 25, 2025Updated last year
- A repository for DenseSSMs☆90Apr 11, 2024Updated 2 years ago
- A sophisticated multi-agent system designed for real-time market analysis of HTX (formerly Huobi) exchange data. This swarm combines spec…☆10Mar 18, 2025Updated last year
- A simpler Pytorch + Zeta Implementation of the paper: "SiMBA: Simplified Mamba-based Architecture for Vision and Multivariate Time series…☆29Nov 11, 2024Updated last year
- TUI application for viewing the status of GPU allocations on a Slurm cluster☆11Dec 11, 2023Updated 2 years ago
- Implementation of a modular, high-performance, and simplistic mamba for high-speed applications☆40Nov 11, 2024Updated last year
- Build high-performance AI models with modular building blocks☆594May 19, 2026Updated 3 weeks ago
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆232Oct 16, 2025Updated 7 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Awesome list of papers that extend Mamba to various applications.☆141Jun 4, 2026Updated last week
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆27Apr 17, 2024Updated 2 years ago
- ☆36Nov 22, 2024Updated last year
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling without…☆22Mar 15, 2025Updated last year
- A simple implementation of [Mamba: Linear-Time Sequence Modeling with Selective State Spaces](https://arxiv.org/abs/2312.00752)☆22Jan 22, 2024Updated 2 years ago
- OmniByteFormer is a generalized Transformer model that can process any type of data by converting it into byte sequences, bypassing tradi…☆16May 25, 2026Updated 2 weeks ago
- NeurIPS 2026 paper: The Geometry of Consolidation — follow-up to HIDE and No-Escape.☆110May 5, 2026Updated last month
- A simple and efficient Mamba implementation in pure PyTorch and MLX.☆1,464May 3, 2026Updated last month
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆37Aug 14, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Simple, minimal implementation of the Mamba SSM in one file of PyTorch.☆2,950Mar 8, 2024Updated 2 years ago
- Mamba-Chat: A chat LLM based on the state-space model architecture 🐍☆942Mar 3, 2024Updated 2 years ago
- Here we will test various linear attention designs.☆62Apr 25, 2024Updated 2 years ago
- ☆20Dec 24, 2024Updated last year
- Unofficial implementation of paper : Exploring the Space of Key-Value-Query Models with Intention☆12May 24, 2023Updated 3 years ago
- NewsAgent is an enterprise-grade news aggregation agent designed to fetch, query, and summarize news from multiple sources at scale.☆27Oct 13, 2025Updated 7 months ago
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆57May 17, 2024Updated 2 years ago