nil0x9 / flash-muonLinks

Flash-Muon: An Efficient Implementation of Muon Optimizer

☆195

Alternatives and similar repositories for flash-muon

Users that are interested in flash-muon are comparing it to the libraries listed below

Sorting:

Dao-AILab / grouped-latent-attention
☆130Updated 4 months ago
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆101Updated last week
xiayuqing0622 / flex_head_fa
Fast and memory-efficient exact attention
☆71Updated 7 months ago
HanGuo97 / log-linear-attention
☆251Updated 4 months ago
sustcsonglin / linear-attention-and-beyond-slides
☆93Updated 8 months ago
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆266Updated last month
tilde-research / nsa-impl
An efficient implementation of the NSA (Native Sparse Attention) kernel
☆119Updated 4 months ago
XunhaoLai / native-sparse-attention-triton
Efficient triton implementation of Native Sparse Attention.
☆238Updated 5 months ago
alexzhang13 / flashattention2-custom-mask
Triton implementation of FlashAttention2 that adds Custom Masks.
☆141Updated last year
FasterDecoding / TEAL
☆145Updated 8 months ago
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆246Updated 3 weeks ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆192Updated 11 months ago
IST-DASLab / QuEST
Work in progress.
☆74Updated 3 months ago
li-plus / flash-preference
Accelerate LLM preference tuning via prefix sharing with a single line of code
☆46Updated 3 months ago
feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆78Updated last year
PiotrNawrot / nano-sparse-attention
The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.
☆92Updated 3 months ago
gpu-mode / ring-attention
ring-attention experiments
☆154Updated last year
mgmalek / efficient_cross_entropy
☆121Updated last year
NVlabs / GatedDeltaNet
[ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule
☆331Updated last month
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆242Updated 2 months ago
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 10 months ago
huggingface / kernels
Load compute kernels from the Hub
☆304Updated last week
mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆120Updated 10 months ago
kyleliang919 / Super_Muon
☆64Updated 7 months ago
samsja / muon_fsdp_2
Muon fsdp 2
☆44Updated 2 months ago
yaof20 / Flash-RL
Implementation for FP8/INT8 Rollout for RL training without performence drop.
☆261Updated 3 weeks ago
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆248Updated 8 months ago
insuhan / hyper-attn
☆83Updated last year
ByteDance-Seed / FlexPrefill
Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference
☆147Updated last week
tilde-research / MoMoE-impl
Memory optimized Mixture of Experts
☆68Updated 3 months ago