fla-org / flameLinks
π₯ A minimal training framework for scaling FLA models
β327Updated last month
Alternatives and similar repositories for flame
Users that are interested in flame are comparing it to the libraries listed below
Sorting:
- Efficient triton implementation of Native Sparse Attention.β257Updated 7 months ago
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Ruleβ415Updated 3 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.β159Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ225Updated 6 months ago
- Implementation for FP8/INT8 Rollout for RL training without performence drop.β282Updated 2 months ago
- β133Updated 7 months ago
- β102Updated 10 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.β104Updated last year
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ263Updated 6 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ256Updated 4 months ago
- β20Updated last week
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inferenceβ160Updated 2 months ago
- β157Updated 10 months ago
- Triton-based implementation of Sparse Mixture of Experts.β259Updated 3 months ago
- β213Updated last month
- β262Updated 7 months ago
- Physics of Language Models, Part 4β291Updated this week
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMsβ182Updated 3 months ago
- Accelerating MoE with IO and Tile-aware Optimizationsβ500Updated last week
- HuggingFace conversion and training library for Megatron-based modelsβ324Updated this week
- Fast and memory-efficient exact attentionβ75Updated 10 months ago
- β126Updated 7 months ago
- Efficient 2:4 sparse training algorithms and implementationsβ58Updated last year
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"β764Updated last month
- An efficient implementation of the NSA (Native Sparse Attention) kernelβ128Updated 6 months ago
- β114Updated 3 months ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ362Updated 5 months ago
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β277Updated last month
- Some preliminary explorations of Mamba's context scaling.β218Updated last year
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Headsβ515Updated 10 months ago