☆249Nov 19, 2025Updated 6 months ago
Alternatives and similar repositories for flash-moba
Users that are interested in flash-moba are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆33Dec 31, 2025Updated 5 months ago
- NVIDIA cuTile learn☆169Dec 9, 2025Updated 6 months ago
- [ASPLOS'26] Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter☆173Feb 27, 2026Updated 3 months ago
- ☆36Mar 7, 2025Updated last year
- Vortex: Programmable Sparse Attention for Agents as Algorithm Designers☆59Jun 8, 2026Updated last week
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs☆204Updated this week
- Efficient Long-context Language Model Training by Core Attention Disaggregation☆105Apr 7, 2026Updated 2 months ago
- [NeurIPS'25 Spotlight] Adaptive Attention Sparsity with Hierarchical Top-p Pruning☆101Apr 20, 2026Updated last month
- Xmixers: A collection of SOTA efficient token/channel mixers☆28Sep 4, 2025Updated 9 months ago
- ☆282Jun 6, 2025Updated last year
- ☆66Apr 26, 2025Updated last year
- ☆135Updated this week
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- ☆22May 5, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoring☆281Jul 6, 2025Updated 11 months ago
- A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention☆298Dec 1, 2025Updated 6 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆251Jun 15, 2025Updated last year
- ☆139May 29, 2025Updated last year
- Distributed MoE in a Single Kernel [NeurIPS '25]☆266May 5, 2026Updated last month
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆265Aug 9, 2025Updated 10 months ago
- 🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"☆1,006Feb 5, 2026Updated 4 months ago
- [NeurIPS 2025] ClusterFusion: Expanding Operator Fusion Scope for LLM Inference via Cluster-Level Collective Primitive☆73Dec 11, 2025Updated 6 months ago
- A sparse attention kernel supporting mix sparse patterns