zzd1992 / FlashWindowAttentionLinks
Speedup the attention computation of Swin Transformer
☆31Updated 7 months ago
Alternatives and similar repositories for FlashWindowAttention
Users that are interested in FlashWindowAttention are comparing it to the libraries listed below
Sorting:
- Triton implement of bi-directional (non-causal) linear attention☆65Updated last week
- A library for calculating the FLOPs in the forward() process based on torch.fx☆137Updated last month
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆341Updated last year
- When it comes to optimizers, it's always better to be safe than sorry☆402Updated 4 months ago
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆233Updated 7 months ago
- An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…☆101Updated 3 months ago
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆220Updated 2 years ago
- ☆191Updated last year
- A simple minimal implementation of Reversible Vision Transformers☆127Updated last year
- Batch computation of the linear assignment problem on GPU.☆105Updated 4 months ago
- Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML 2024)☆31Updated last year
- Patch convolution to avoid large GPU memory usage of Conv2D☆95Updated last year
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆135Updated 3 months ago
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆340Updated 11 months ago
- flex-block-attn: an efficient block sparse attention computation library☆108Updated last month
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆168Updated 3 weeks ago
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆68Updated last year
- Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels☆112Updated 2 years ago
- A block oriented training approach for inference time optimization.☆34Updated last year
- ☆48Updated last month
- Pytorch Implementation of the sparse attention from the paper: "Generating Long Sequences with Sparse Transformers"☆94Updated last week
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆452Updated 4 months ago
- An official code release of the paper RGB no more: Minimally Decoded JPEG Vision Transformers☆57Updated 2 years ago
- Implementation of Linformer for Pytorch☆305Updated 2 years ago
- ☆160Updated 2 years ago
- This repository contains the experimental PyTorch native float8 training UX☆227Updated last year
- Fast and memory-efficient exact attention☆20Updated last year
- ☆292Updated last year
- ☆201Updated 2 years ago
- ☆307Updated 9 months ago