joey00072 / Multi-Head-Latent-Attention-MLA-Links
working implimention of deepseek MLA
☆42Updated 5 months ago
Alternatives and similar repositories for Multi-Head-Latent-Attention-MLA-
Users that are interested in Multi-Head-Latent-Attention-MLA- are comparing it to the libraries listed below
Sorting:
- Collection of autoregressive model implementation☆85Updated 2 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆55Updated last year
- NanoGPT-speedrunning for the poor T4 enjoyers☆66Updated 2 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated last month
- an open source reproduction of NVIDIA's nGPT (Normalized Transformer with Representation Learning on the Hypersphere)☆101Updated 3 months ago
- ☆46Updated 2 months ago
- Focused on fast experimentation and simplicity☆74Updated 6 months ago
- Lego for GRPO☆28Updated 3 weeks ago
- ☆56Updated 3 months ago
- rl from zero pretrain, can it be done? we'll see.