An extension of the nanoGPT repository for training small MOE models.
☆251Mar 9, 2025Updated last year
Alternatives and similar repositories for nanoMoE
Users that are interested in nanoMoE are comparing it to the libraries listed below
Sorting:
- This is a simple torch implementation of the high performance Multi-Query Attention☆16Aug 23, 2023Updated 2 years ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆91Jul 17, 2025Updated 7 months ago
- Minimalistic large language model 3D-parallelism training☆2,588Feb 19, 2026Updated 2 weeks ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- Minimalistic 4D-parallelism distributed training framework for education purpose☆2,104Aug 26, 2025Updated 6 months ago
- Vocabulary Parallelism☆25Mar 10, 2025Updated last year
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆23Aug 18, 2024Updated last year
- ☆25Aug 19, 2025Updated 6 months ago
- ☆16Sep 17, 2024Updated last year
- ☆12May 20, 2025Updated 9 months ago
- For ACL25 paper "WAFFLE: Multi-Modal Model for Automated Front-End Development" - by Shanchao Liang and Nan Jiang and Shangshu Qian and L…☆11May 28, 2025Updated 9 months ago
- Sound Separation, Omni modal☆28Sep 15, 2025Updated 5 months ago
- [ICML 2025] LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models☆17Nov 4, 2025Updated 4 months ago
- Node based programming tool☆12Jan 8, 2023Updated 3 years ago
- Code in support of the paper Continuous Mixtures of Tractable Probabilistic Models☆12Oct 12, 2024Updated last year
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated 8 months ago
- 🎵 muse: Music Separation☆11Feb 14, 2024Updated 2 years ago
- FlexiTokens☆18Dec 27, 2025Updated 2 months ago
- NanoGPT (124M) in 2 minutes☆4,734Feb 27, 2026Updated last week
- Language models scale reliably with over-training and on downstream tasks☆100Apr 2, 2024Updated last year
- The YouTube Text-To-Speech dataset is comprised of waveform audio extracted from YouTube videos alongside their English transcriptions☆52Apr 1, 2021Updated 4 years ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆79Apr 2, 2024Updated last year
- [ICLR 26] The official code repository for the paper "Mirage or Method? How Model–Task Alignment Induces Divergent RL Conclusions".☆16Feb 9, 2026Updated last month
- torchlogic is a pytorch framework for developing Neuro-Symbolic AI systems and implements Neural Reasoning Networks.☆17Sep 18, 2025Updated 5 months ago
- ☆21Oct 22, 2025Updated 4 months ago
- ☆12Jun 12, 2024Updated last year
- List of papers on Self-Correction of LLMs.☆80Dec 28, 2024Updated last year
- Official repo for DisCoder: High-Fidelity Music Vocoder using Neural Audio Codecs presented at ICASSP 2025☆38Feb 24, 2025Updated last year
- Interface Design for Self-Supervised Speech Models, Accepted to Interspeech2024☆16Nov 19, 2024Updated last year
- KV Cache Steering for Inducing Reasoning in Small Language Models☆46Jul 24, 2025Updated 7 months ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆35Oct 3, 2024Updated last year
- The CodeInsight dataset is designed for code generation tasks, providing developers with expert-curated examples that bridge the gap betw…☆14Oct 22, 2024Updated last year
- ☆221Jan 23, 2025Updated last year
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆281Nov 24, 2025Updated 3 months ago
- ☆185Feb 8, 2025Updated last year
- Revisiting Mid-training in the Era of Reinforcement Learning Scaling☆185Jul 23, 2025Updated 7 months ago
- A videogame made with PyGame turned into an Open AI Gym Learning Environment for Reinforcement Learning agents.☆15Jan 3, 2023Updated 3 years ago
- nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)☆142May 8, 2025Updated 10 months ago
- 🚀 Efficient implementations of state-of-the-art linear attention models☆4,474Updated this week