tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆54Updated last year
Alternatives and similar repositories for mamba-train:
Users that are interested in mamba-train are comparing it to the libraries listed below
- working implimention of deepseek MLA☆40Updated 3 months ago
- A repository for research on medium sized language models.☆76Updated 11 months ago
- Collection of autoregressive model implementation☆85Updated last week
- https://x.com/BlinkDL_AI/status/1884768989743882276☆27Updated this week
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 7 months ago
- RWKV-7: Surpassing GPT☆83Updated 5 months ago
- ☆54Updated last month
- ☆78Updated 8 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆45Updated 2 weeks ago
- Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)☆153Updated 3 weeks ago
- ☆50Updated 6 months ago
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆42Updated 11 months ago
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆98Updated last month
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆32Updated 8 months ago
- Evaluating the Mamba architecture on the Othello game☆47Updated last year
- Deep learning library implemented from scratch in numpy. Mixtral, Mamba, LLaMA, GPT, ResNet, and other experiments.☆51Updated last year
- My fork os allen AI's OLMo for educational purposes.☆30Updated 5 months ago
- RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best…☆44Updated last month
- Fast and memory-efficient exact attention☆68Updated 2 months ago
- ☆80Updated last year
- Griffin MQA + Hawk Linear RNN Hybrid☆86Updated last year
- Implementation of Infini-Transformer in Pytorch☆110Updated 4 months ago
- Set of scripts to finetune LLMs☆37Updated last year
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆30Updated last month
- This is the code that went into our practical dive using mamba as information extraction☆54Updated last year
- Work in progress.☆58Updated 3 weeks ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆123Updated 8 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆159Updated last month
- ☆49Updated last year