Triton implement of bi-directional (non-causal) linear attention
☆70Feb 22, 2026Updated last week
Alternatives and similar repositories for flash-bidirectional-linear-attention
Users that are interested in flash-bidirectional-linear-attention are comparing it to the libraries listed below
Sorting:
- ☆17Dec 19, 2024Updated last year
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆30Jan 28, 2026Updated last month
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- [ICLR'25] "Understanding Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing" by Peihao Wang, Ruisi Cai, Yue…☆17Mar 21, 2025Updated 11 months ago
- Flash-Linear-Attention models beyond language☆21Aug 28, 2025Updated 6 months ago
- Open-sourcing code associated with the AAAI-25 paper "On the Expressiveness and Length Generalization of Selective State-Space Models on …☆14Sep 18, 2025Updated 5 months ago
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…☆14Oct 17, 2023Updated 2 years ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Jun 6, 2024Updated last year
- 🔥 A minimal training framework for scaling FLA models☆352Nov 15, 2025Updated 3 months ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- ☆66Jul 8, 2025Updated 7 months ago
- ☆17Jun 11, 2025Updated 8 months ago
- ☆17Jan 26, 2024Updated 2 years ago
- ☆105Nov 7, 2024Updated last year
- Official repository of "Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions", ICLR 2024 Sp…☆21Mar 7, 2024Updated last year
- Official PyTorch Implementation of the Longhorn Deep State Space Model☆56Dec 4, 2024Updated last year
- ☆19Dec 4, 2025Updated 3 months ago
- ☆19May 2, 2024Updated last year
- AGaLiTe: Approximate Gated Linear Transformers for Online Reinforcement Learning (Published in TMLR)☆23Oct 15, 2024Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆248Jun 6, 2025Updated 8 months ago
- RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's l…☆54Jan 12, 2026Updated last month
- Linear Attention Sequence Parallelism (LASP)☆88Jun 4, 2024Updated last year
- ☆22Dec 15, 2023Updated 2 years ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- WavBench: Benchmarking Reasoning, Colloquialism, and Paralinguistics for End-to-End Spoken Dialogue Models☆27Feb 13, 2026Updated 2 weeks ago
- ☆24Sep 25, 2024Updated last year
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆91Jul 17, 2025Updated 7 months ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆64Jul 30, 2023Updated 2 years ago
- 🚀 Efficient implementations of state-of-the-art linear attention models☆4,474Updated this week
- This is not remotely close to a finished product, and does not intend to nor does this claim to be working fine-tuning code for MaskGCT. …☆13Dec 4, 2024Updated last year
- Implementation of Hyena Hierarchy in JAX☆10Apr 30, 2023Updated 2 years ago
- Onset-and-Offset-Aware Sound Event Detection☆21Feb 10, 2025Updated last year
- ☆11Nov 7, 2024Updated last year
- ☆11Oct 11, 2023Updated 2 years ago
- 🎉My Collections of CUDA Kernels~☆11Jun 25, 2024Updated last year
- Phonemes and durations labeling based on whisper small☆11Jul 7, 2024Updated last year
- Compression for Foundation Models☆35Jul 21, 2025Updated 7 months ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆170Jan 30, 2025Updated last year