shreyansh26 / An-Empirical-Model-of-Large-Batch-TrainingLinks
An approximate implementation of the OpenAI paper - An Empirical Model of Large-Batch Training for MNIST
☆10Updated 2 years ago
Alternatives and similar repositories for An-Empirical-Model-of-Large-Batch-Training
Users that are interested in An-Empirical-Model-of-Large-Batch-Training are comparing it to the libraries listed below
Sorting:
- Griffin MQA + Hawk Linear RNN Hybrid☆87Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆83Updated last month
- ☆53Updated last year
- ☆48Updated last year
- ☆37Updated 3 months ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆69Updated 11 months ago
- ☆82Updated 11 months ago
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆64Updated 2 weeks ago
- ☆81Updated last year
- ☆112Updated last year
- ☆11Updated 4 months ago
- ☆19Updated 2 months ago
- Combining SOAP and MUON☆16Updated 5 months ago
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆47Updated 2 months ago
- DPO, but faster 🚀☆43Updated 7 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆77Updated last year
- Custom triton kernels for training Karpathy's nanoGPT.☆19Updated 8 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆127Updated 10 months ago
- ☆80Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆142Updated last month
- ☆59Updated 3 months ago
- Minimal (400 LOC) implementation Maximum (multi-node, FSDP) GPT training☆129Updated last year
- Official code release for "SuperBPE: Space Travel for Language Models"☆61Updated this week
- ☆28Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆128Updated 7 months ago
- Efficient PScan implementation in PyTorch☆16Updated last year
- ☆119Updated last month
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆66Updated last year
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆55Updated 4 months ago
- ☆56Updated last year