π₯ A minimal training framework for scaling FLA models
β391Apr 22, 2026Updated last month
Alternatives and similar repositories for flame
Users that are interested in flame are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β33Dec 31, 2025Updated 5 months ago
- π Efficient implementations for emerging model architecturesβ5,182Updated this week
- β137Jun 6, 2025Updated last year
- Here we will test various linear attention designs.β62Apr 25, 2024Updated 2 years ago
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ251Jun 15, 2025Updated 11 months ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Seβ¦β68Apr 24, 2024Updated 2 years ago
- π³ Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"β1,004Feb 5, 2026Updated 4 months ago
- Triton implement of bi-directional (non-causal) linear attentionβ76Mar 1, 2026Updated 3 months ago
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruningβ150Feb 25, 2026Updated 3 months ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weightsβ19Oct 9, 2022Updated 3 years ago
- β70Jul 8, 2025Updated 11 months ago
- Awesome Triton Resourcesβ42Apr 27, 2025Updated last year
- Linear Attention Sequence Parallelism (LASP)β88Jun 4, 2024Updated 2 years ago
- Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on β¦β16Sep 18, 2025Updated 8 months ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Experiments on the impact of depth in transformers and SSMs.β41Oct 23, 2025Updated 7 months ago
- β12Jan 29, 2021Updated 5 years ago
- Helpful tools and examples for working with flex-attentionβ1,193May 28, 2026Updated last week
- β61Jul 9, 2024Updated last year
- HGRN2: Gated Linear RNNs with State Expansionβ57Aug 20, 2024Updated last year
- Official PyTorch Implementation of the Longhorn Deep State Space Modelβ57Dec 4, 2024Updated last year
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Ruleβ595Mar 13, 2026Updated 2 months ago
- Expanding linear RNN state-transition matrix eigenvalues to include negatives improves state-tracking tasks and language modeling withoutβ¦β22Mar 15, 2025Updated last year
- A PyTorch native platform for training generative AI modelsβ5,416Updated this week
- GPUs on demand by Runpod - Special Offer Available β’ AdRun AI, ML, and HPC workloads on powerful cloud GPUsβwithout limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's lβ¦β57Mar 31, 2026Updated 2 months ago
- β52May 19, 2025Updated last year
- β132Feb 4, 2026Updated 4 months ago
- Flash-Linear-Attention models beyond languageβ21Aug 28, 2025Updated 9 months ago
- Ring attention implementation with flash attentionβ1,024Sep 10, 2025Updated 8 months ago
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitzβ¦β14Oct 17, 2023Updated 2 years ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"β259Jan 31, 2025Updated last year
- β45Nov 1, 2025Updated 7 months ago
- Stick-breaking attentionβ63Jul 1, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)β36Jan 18, 2025Updated last year
- FlexAttention w/ FlashAttention3 Supportβ27Oct 5, 2024Updated last year
- Fork of Flame repo for training of some new stuff in developmentβ19Jun 1, 2026Updated last week
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformerβ64Jul 30, 2023Updated 2 years ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modelingβ40Dec 2, 2023Updated 2 years ago
- Official Implementation of ACL2023: Don't Parse, Choose Spans! Continuous and Discontinuous Constituency Parsing via Autoregressive Span β¦β14Aug 25, 2023Updated 2 years ago
- Muon fsdp 2β59Aug 8, 2025Updated 10 months ago