fla-org / flameLinks
π₯ A minimal training framework for scaling FLA models
β220Updated last month
Alternatives and similar repositories for flame
Users that are interested in flame are comparing it to the libraries listed below
Sorting:
- Efficient triton implementation of Native Sparse Attention.β186Updated 2 months ago
- β123Updated 2 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.β85Updated 7 months ago
- Triton-based implementation of Sparse Mixture of Experts.β230Updated 8 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.β128Updated 11 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ221Updated last month
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ149Updated last month
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"β320Updated this week
- β79Updated 5 months ago
- β137Updated 5 months ago
- [ICML 2025] XAttention: Block Sparse Attention with Antidiagonal Scoringβ213Updated last month
- Some preliminary explorations of Mamba's context scaling.β216Updated last year
- Physics of Language Models, Part 4β204Updated last week
- β95Updated 3 months ago
- β113Updated 2 months ago
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inferenceβ124Updated 2 months ago
- β232Updated 2 months ago
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Frameworkβ399Updated this week
- Low-bit optimizers for PyTorchβ130Updated last year
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Ruleβ193Updated 4 months ago
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Trainingβ212Updated 11 months ago
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"β119Updated last year
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β258Updated last week
- A sparse attention kernel supporting mix sparse patternsβ262Updated 5 months ago
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMsβ141Updated last week
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β238Updated 2 months ago
- An efficient implementation of the NSA (Native Sparse Attention) kernelβ108Updated last month
- β147Updated 2 years ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ311Updated 3 weeks ago
- β268Updated 3 weeks ago