nil0x9 / flash-muonLinks
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆160Updated 2 months ago
Alternatives and similar repositories for flash-muon
Users that are interested in flash-muon are comparing it to the libraries listed below
Sorting:
- ☆123Updated 2 months ago
- The evaluation framework for training-free sparse attention in LLMs☆90Updated 2 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernel☆113Updated 2 months ago
- Efficient triton implementation of Native Sparse Attention.☆208Updated 3 months ago
- ☆237Updated 2 months ago
- 🔥 A minimal training framework for scaling FLA models☆233Updated last week
- Fast and memory-efficient exact attention☆70Updated 5 months ago
- ☆139Updated 6 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆76Updated last year
- Triton implementation of FlashAttention2 that adds Custom Masks.☆132Updated last year
- ☆80Updated 6 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆233Updated 8 months ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆43Updated last month
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆231Updated 2 weeks ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆84Updated last month
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆73Updated last year
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆141Updated last year
- Load compute kernels from the Hub☆244Updated this week
- ring-attention experiments☆149Updated 10 months ago
- ☆77Updated 3 months ago
- Normalized Transformer (nGPT)☆186Updated 9 months ago
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM☆165Updated last year
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆128Updated 8 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆219Updated last month
- Work in progress.☆72Updated last month
- Kinetics: Rethinking Test-Time Scaling Laws☆76Updated last month
- Code for studying the super weight in LLM☆115Updated 8 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆199Updated 5 months ago
- [CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>☆145Updated last month
- 16-fold memory access reduction with nearly no loss☆104Updated 5 months ago