jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆206Updated 11 months ago
Alternatives and similar repositories for LongMamba:
Users that are interested in LongMamba are comparing it to the libraries listed below
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆219Updated last month
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆188Updated 2 weeks ago
- ☆180Updated this week
- Official implementation of Phi-Mamba. A MOHAWK-distilled model (Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Mode…☆92Updated 4 months ago
- Understand and test language model architectures on synthetic tasks.☆175Updated this week
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆115Updated 4 months ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆148Updated last month
- Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch☆145Updated 2 weeks ago
- A MAD laboratory to improve AI architecture designs 🧪☆102Updated last month
- Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆52Updated 3 weeks ago
- ☆135Updated last year
- Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆73Updated 2 weeks ago
- Normalized Transformer (nGPT)☆145Updated last month
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆203Updated 3 weeks ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆96Updated 3 months ago
- Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch☆297Updated 7 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆90Updated last month
- ☆78Updated 3 months ago
- Token Omission Via Attention☆122Updated 3 months ago
- Code repository for Black Mamba☆234Updated 11 months ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆256Updated 8 months ago
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆270Updated 2 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆192Updated last month
- Code accompanying the paper "Massive Activations in Large Language Models"☆133Updated 10 months ago
- Language models scale reliably with over-training and on downstream tasks☆96Updated 9 months ago
- ☆51Updated 7 months ago
- ☆168Updated last year
- ☆190Updated last month
- ☆119Updated 4 months ago
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆145Updated 6 months ago