fla-org / flash-bidirectional-linear-attentionLinks
Triton implement of bi-directional (non-causal) linear attention
☆50Updated 4 months ago
Alternatives and similar repositories for flash-bidirectional-linear-attention
Users that are interested in flash-bidirectional-linear-attention are comparing it to the libraries listed below
Sorting:
- ☆82Updated last month
- [ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"☆108Updated last month
- [ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization☆71Updated 3 weeks ago
- ☆51Updated 3 months ago
- Here we will test various linear attention designs.☆59Updated last year
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆60Updated last year
- XAttention: Block Sparse Attention with Antidiagonal Scoring☆166Updated this week
- Code for Draft Attention☆72Updated last month
- ☆208Updated 2 weeks ago
- Flash-Linear-Attention models beyond language☆16Updated this week
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆38Updated 8 months ago
- DeciMamba: Exploring the Length Extrapolation Potential of Mamba (ICLR 2025)☆28Updated 2 months ago
- Official PyTorch implementation of "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models"☆35Updated 3 weeks ago
- ☆76Updated 3 months ago
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching☆105Updated 11 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆65Updated last year
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Folders☆24Updated 4 months ago
- ☆105Updated last year
- Implementation of the proposed MaskBit from Bytedance AI☆82Updated 7 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆131Updated last week
- ☆17Updated 5 months ago
- ☆31Updated last year
- [ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…☆24Updated 6 months ago
- [ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs☆83Updated 6 months ago
- The official code implementation for paper "R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing"☆35Updated this week
- Offical implementation of "MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map" (NeurIPS2024 Oral)☆25Updated 5 months ago
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆23Updated 7 months ago
- ☆23Updated 2 months ago
- [CVPR 2025] Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers☆51Updated 9 months ago
- Pytorch implementation of "Oscillation-Reduced MXFP4 Training for Vision Transformers" on DeiT Model Pre-training☆14Updated last month