nil0x9 / flash-muonLinks
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆115Updated this week
Alternatives and similar repositories for flash-muon
Users that are interested in flash-muon are comparing it to the libraries listed below
Sorting:
- ☆53Updated this week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆127Updated this week
- ☆129Updated 3 months ago
- Official implementation of "The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs"☆32Updated last month
- Odysseus: Playground of LLM Sequence Parallelism☆69Updated 11 months ago
- 🔥 A minimal training framework for scaling FLA models☆146Updated 3 weeks ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆62Updated 4 months ago
- Efficient triton implementation of Native Sparse Attention.☆155Updated last week
- Fast and memory-efficient exact attention☆68Updated 3 months ago
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆41Updated last month
- Transformers components but in Triton☆33Updated 3 weeks ago
- Using FlexAttention to compute attention with different masking patterns☆43Updated 8 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆75Updated 8 months ago
- ☆108Updated last year
- ☆74Updated 3 months ago
- Load compute kernels from the Hub☆139Updated this week
- ☆20Updated 3 weeks ago
- PyTorch bindings for CUTLASS grouped GEMM.☆93Updated this week
- Work in progress.☆67Updated this week
- DPO, but faster 🚀☆42Updated 5 months ago
- ☆56Updated 2 months ago
- Experiment of using Tangent to autodiff triton☆79Updated last year
- XAttention: Block Sparse Attention with Antidiagonal Scoring☆158Updated 3 weeks ago
- Triton-based implementation of Sparse Mixture of Experts.☆216Updated 6 months ago
- ☆70Updated 2 weeks ago
- ring-attention experiments☆143Updated 7 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆75Updated last year
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆67Updated 10 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆203Updated 2 weeks ago
- CUDA and Triton implementations of Flash Attention with SoftmaxN.☆70Updated last year