jxiw/MambaByte

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jxiw/MambaByte)

jxiw / MambaByte

[CoLM 24] Official Repository of MambaByte: Token-free Selective State Space Model

☆27

Alternatives and similar repositories for MambaByte

Users that are interested in MambaByte are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dangxingyu / rnn-icrag
View on GitHub
Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"
☆27Apr 17, 2024Updated 2 years ago
bdusell / stack-attention
View on GitHub
Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"
☆18Mar 15, 2024Updated 2 years ago
kyegomez / MambaByte
View on GitHub
Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta
☆128Updated this week
dtunai / Griffin-Jax
View on GitHub
Jax implementation of "Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models"
☆15May 10, 2024Updated 2 years ago
epfml / pam
View on GitHub
☆16Dec 9, 2023Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
rkishony / data-to-paper-supplementary
View on GitHub
☆12Aug 21, 2024Updated last year
Zcchill / Value-Residual-Learning
View on GitHub
☆15Mar 20, 2025Updated last year
sjelassi / transformers_ssm_copy
View on GitHub
☆40Feb 26, 2024Updated 2 years ago
llmsresearch / scone
View on GitHub
Implementation and evaluation of Scaling Embedding Layers in Language Models research paper
☆15Feb 2, 2026Updated 5 months ago
johanwind / wind_rwkv
View on GitHub
☆27Feb 26, 2026Updated 4 months ago
maximzubkov / fft-scan
View on GitHub
Efficient PScan implementation in PyTorch
☆17Jan 2, 2024Updated 2 years ago
Lizn-zn / Nesy-Programming
View on GitHub
☆10Oct 28, 2024Updated last year
emalach / LinearLM
View on GitHub
Code for the paper: https://arxiv.org/pdf/2309.06979.pdf
☆21Jul 29, 2024Updated last year
Richar-Du / Virgo
View on GitHub
Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*
☆20May 27, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
shawntan / stickbreaking-attention
View on GitHub
Stick-breaking attention
☆63Jul 1, 2025Updated last year
ellisk42 / LambdaBeam
View on GitHub
☆14Dec 31, 2023Updated 2 years ago
proger / nanokitchen
View on GitHub
Parallel Associative Scan for Language Models
☆18Jan 8, 2024Updated 2 years ago
Doraemonzzz / xmixers
View on GitHub
Xmixers: A collection of SOTA efficient token/channel mixers
☆28Sep 4, 2025Updated 10 months ago
goombalab / phi-mamba
View on GitHub
Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…
☆126Sep 13, 2024Updated last year
RobertCsordas / moe_layer
View on GitHub
sigma-MoE layer
☆21Jan 5, 2024Updated 2 years ago
pa-ba / reg-machine
View on GitHub
Coq & Haskell code for Calculating Correct Compilers II
☆12Feb 22, 2022Updated 4 years ago
zhangir-azerbayev / proof-pile
View on GitHub
Scripts for downloading and pre-processing the `proof-pile`, a high quality dataset of mathematical text and code.
☆22Nov 26, 2022Updated 3 years ago
gabegrand / self-steering
View on GitHub
[COLM 2025] Official repo for Self-Steering Language Models
☆26Aug 8, 2025Updated 11 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
LUMIA-Group / PonderingLM
View on GitHub
Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"
☆26Jul 21, 2025Updated last year
texcoffier / zmw
View on GitHub
Zero Memory Widget
☆10Dec 30, 2020Updated 5 years ago
kazuki-irie / kv-memory-brain
View on GitHub
Official Code Repository for the paper "Key-value memory in the brain"
☆32Feb 25, 2025Updated last year
Cranial-XIX / longhorn
View on GitHub
Official PyTorch Implementation of the Longhorn Deep State Space Model
☆57Dec 4, 2024Updated last year
hyintell / LLMSymbolic
View on GitHub
☆22Feb 29, 2024Updated 2 years ago
schwartz-lab-NLP / Tokens2Words
View on GitHub
☆16Apr 2, 2025Updated last year
google-deepmind / spectral_ssm
View on GitHub
☆35Apr 12, 2024Updated 2 years ago
THU-KEG / Skill-Neuron
View on GitHub
Source code for EMNLP2022 paper "Finding Skill Neurons in Pre-trained Transformers via Prompt Tuning".
☆18Mar 13, 2023Updated 3 years ago
leloykun / adaptive-muon
View on GitHub
A single-line modification to any (dualizer-based) optimizer that allows the optimizer to adapt to the scale of the gradients as they cha…
☆19Jan 11, 2025Updated last year
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
alkaidpku / DQ-ToolQA
View on GitHub
☆10Nov 15, 2023Updated 2 years ago
alessiodevoto / l2compress
View on GitHub
Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."
☆19Dec 13, 2024Updated last year
Doraemonzzz / hgru-pytorch
View on GitHub
☆29Jul 9, 2024Updated 2 years ago
expz / annotated-hyena
View on GitHub
An annotated implementation of the Hyena Hierarchy paper
☆34May 28, 2023Updated 3 years ago
s-vco / s-vco
View on GitHub
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
☆19Jun 4, 2025Updated last year
taufeeque9 / codebook-features
View on GitHub
Sparse and discrete interpretability tool for neural networks
☆64Feb 12, 2024Updated 2 years ago
Princeton-SysML / kNNLM_privacy
View on GitHub
Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888
☆37Jun 10, 2024Updated 2 years ago