kyegomez / MambaByteLinks

Implementation of MambaByte in "MambaByte: Token-free Selective State Space Model" in Pytorch and Zeta

☆120

Alternatives and similar repositories for MambaByte

Users that are interested in MambaByte are comparing it to the libraries listed below

Sorting:

Zyphra / BlackMamba
Code repository for Black Mamba
☆252Updated last year
kyegomez / Jamba
PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"
☆179Updated 4 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆127Updated 11 months ago
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆56Updated last year
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆184Updated 6 months ago
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆98Updated 10 months ago
kyegomez / MoE-Mamba
Implementation of MoE Mamba from the paper: "MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts" in Pytorch and Ze…
☆109Updated last week
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆94Updated 8 months ago
huyphan168 / PEER
Mixture of A Million Experts
☆46Updated last year
jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆225Updated 3 months ago
llm-random / llm-random
☆192Updated last week
LegallyCoder / mamba-hf
Implementation of the Mamba SSM with hf_integration.
☆56Updated 11 months ago
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆128Updated 9 months ago
epfml / DenseFormer
☆81Updated last year
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 9 months ago
lucidrains / grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆101Updated 7 months ago
kyegomez / MambaTransformer
Integrating Mamba/SSMs with Transformer for Enhanced Long Context and High-Quality Sequence Modeling
☆200Updated 2 weeks ago
hyperevolnet / Terminator
The official repository for HyperZ⋅Z⋅W Operator Connects Slow-Fast Networks for Full Context Interaction.
☆38Updated 4 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆127Updated 8 months ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆198Updated last year
minyoungg / LTE
☆68Updated last year
LucasPrietoAl / grokking-at-the-edge-of-numerical-stability
☆100Updated 2 weeks ago
kjslag / spacebyte
A byte-level decoder architecture that matches the performance of tokenized Transformers.
☆65Updated last year
BorealisAI / flora-opt
This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.
☆104Updated last year
proger / hippogriff
Griffin MQA + Hawk Linear RNN Hybrid
☆88Updated last year
goombalab / phi-mamba
Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…
☆113Updated 10 months ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆244Updated 6 months ago
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆238Updated 2 months ago
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆103Updated 3 months ago