fla-org / flameLinks
π₯ A minimal training framework for scaling FLA models
β170Updated last week
Alternatives and similar repositories for flame
Users that are interested in flame are comparing it to the libraries listed below
Sorting:
- Efficient triton implementation of Native Sparse Attention.β168Updated last month
- β114Updated 3 weeks ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Trainingβ209Updated last week
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.β81Updated 6 months ago
- XAttention: Block Sparse Attention with Antidiagonal Scoringβ166Updated this week
- Triton-based implementation of Sparse Mixture of Experts.β219Updated 6 months ago
- β130Updated 4 months ago
- Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"β233Updated 2 weeks ago
- β76Updated 3 months ago
- β104Updated 2 weeks ago
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ131Updated last week
- Code for paper: [ICLR2025 Oral] FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inferenceβ113Updated last month
- A sparse attention kernel supporting mix sparse patternsβ238Updated 4 months ago
- β85Updated 2 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.β120Updated 10 months ago
- SeerAttention: Learning Intrinsic Sparse Attention in Your LLMsβ126Updated last week
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Trainingβ210Updated 10 months ago
- PyTorch bindings for CUTLASS grouped GEMM.β99Updated 3 weeks ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"β235Updated 2 weeks ago
- [ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inferenceβ295Updated 7 months ago
- [ICML 2025] Fourier Position Embedding: Enhancing Attentionβs Periodic Extension for Length Generalizationβ71Updated 3 weeks ago
- β256Updated last year
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Ruleβ173Updated 3 months ago
- [ICLR 2025] Palu: Compressing KV-Cache with Low-Rank Projectionβ122Updated 4 months ago
- β208Updated 2 weeks ago
- β105Updated last year
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLMβ163Updated 11 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"β189Updated 3 months ago
- VeOmni: Scaling any Modality Model Training to any Accelerators with PyTorch native Training Frameworkβ353Updated last month
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Modelsβ23Updated this week