fla-org / flameLinks
π₯ A minimal training framework for scaling FLA models
β341Updated 2 months ago
Alternatives and similar repositories for flame
Users that are interested in flame are comparing it to the libraries listed below
Sorting:
- Efficient triton implementation of Native Sparse Attention.β261Updated 8 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.β163Updated last year
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Ruleβ433Updated 4 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ229Updated 7 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.β288Updated 2 months ago
- β104Updated 11 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ258Updated 5 months ago
- Triton-based implementation of Sparse Mixture of Experts.β263Updated 3 months ago
- β269Updated 7 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inferenceβ161Updated 3 months ago
- β132Updated 8 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ266Updated 6 months ago
- Training library for Megatron-based models with bidirectional Hugging Face conversion capabilityβ400Updated this week
- β23Updated last month
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.β105Updated last year
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMsβ187Updated 4 months ago
- β220Updated 2 months ago
- β129Updated 7 months ago
- β158Updated 11 months ago
- Physics of Language Models: Part 4.2, Canon Layers at Scale where Synthetic Pretraining Resonates in Realityβ314Updated 3 weeks ago
- β118Updated 4 months ago
- Accelerating MoE with IO and Tile-aware Optimizationsβ563Updated last week
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ368Updated 6 months ago
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β280Updated 2 months ago
- Efficient 2:4 sparse training algorithms and implementationsβ58Updated last year
- An efficient implementation of the NSA (Native Sparse Attention) kernelβ128Updated 7 months ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ523Updated 11 months ago
- Some preliminary explorations of Mamba's context scaling.β218Updated last year
- [CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>β154Updated 2 weeks ago
- The evaluation framework for training-free sparse attention in LLMsβ114Updated this week