fla-org / flameLinks
π₯ A minimal training framework for scaling FLA models
β319Updated last month
Alternatives and similar repositories for flame
Users that are interested in flame are comparing it to the libraries listed below
Sorting:
- Efficient triton implementation of Native Sparse Attention.β254Updated 6 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.β151Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ222Updated 6 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Ruleβ388Updated 3 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ251Updated 4 months ago
- β132Updated 6 months ago
- Triton-based implementation of Sparse Mixture of Experts.β253Updated 2 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.β280Updated last month
- β100Updated 9 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.β101Updated 11 months ago
- β257Updated 6 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ255Updated 5 months ago
- β203Updated 3 weeks ago
- β155Updated 10 months ago
- Physics of Language Models, Part 4β265Updated last week
- Efficient 2:4 sparse training algorithms and implementationsβ58Updated last year
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inferenceβ156Updated 2 months ago
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β271Updated 3 weeks ago
- β112Updated 2 months ago
- β122Updated 6 months ago
- Some preliminary explorations of Mamba's context scaling.β218Updated last year
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Trainingβ219Updated last year
- PyTorch bindings for CUTLASS grouped GEMM.β132Updated 6 months ago
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMsβ176Updated 2 months ago
- A sparse attention kernel supporting mix sparse patternsβ406Updated 10 months ago
- Fast and memory-efficient exact attentionβ74Updated 9 months ago
- Low-bit optimizers for PyTorchβ133Updated 2 years ago
- TransMLA: Multi-Head Latent Attention Is All You Need (NeurIPS 2025 Spotlight)β415Updated 2 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernelβ126Updated 5 months ago
- Discrete Diffusion Forcing (D2F): dLLMs Can Do Faster-Than-AR Inferenceβ210Updated 2 months ago