fla-org / flash-bidirectional-linear-attention
Triton implement of bi-directional (non-causal) linear attention
β41Updated 2 weeks ago
Alternatives and similar repositories for flash-bidirectional-linear-attention:
Users that are interested in flash-bidirectional-linear-attention are comparing it to the libraries listed below
- [NeurIPS 2024] Learning-to-Cache: Accelerating Diffusion Transformer via Layer Cachingβ92Updated 7 months ago
- π₯ A minimal training framework for scaling FLA modelsβ55Updated this week
- The official repo of continuous speculative decodingβ24Updated 2 months ago
- A WebUI for Side-by-Side Comparison of Media (Images/Videos) Across Multiple Foldersβ19Updated 3 weeks ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Modelsβ28Updated 8 months ago
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"β35Updated 4 months ago
- β30Updated 8 months ago
- β17Updated last month
- This is the official PyTorch implementation of "ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality"β45Updated last month
- An auxiliary project analysis of the characteristics of KV in DiT Attention.β25Updated 2 months ago
- βοΈ Accelerating Vision Diffusion Transformers with Skip Branches.β60Updated 2 months ago
- Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxiang Li, Lu Yiβ¦β16Updated last month
- HGRN2: Gated Linear RNNs with State Expansionβ52Updated 5 months ago
- Here we will test various linear attention designs.β58Updated 9 months ago
- IntLLaMA: A fast and light quantization solution for LLaMAβ18Updated last year
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformerβ59Updated last year
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diβ¦β58Updated 8 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retentiβ¦β60Updated 10 months ago
- GIFT: Generative Interpretable Fine-Tuningβ20Updated 4 months ago
- Inference-only implementation of "One-Step Diffusion Distillation through Score Implicit Matching" [NIPS 2024]β78Updated 3 months ago
- FORA introduces simple yet effective caching mechanism in Diffusion Transformer Architecture for faster inference sampling.β35Updated 7 months ago
- TinyFusion: Diffusion Transformers Learned Shallowβ74Updated 2 months ago
- PyTorch code for Q-DiT: Accurate Post-Training Quantization for Diffusion Transformersβ37Updated 5 months ago
- A big_vision inspired repo that implements a generic Auto-Encoder class capable in representation learning and generative modeling.β34Updated 7 months ago
- A repository for DenseSSMsβ86Updated 10 months ago
- β99Updated 11 months ago
- Codebase for the paper-Elucidating the design space of language models for image generationβ45Updated 3 months ago
- β68Updated this week
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)β24Updated 8 months ago