zzd1992 / FlashWindowAttentionLinks
Speedup the attention computation of Swin Transformer
☆20Updated last month
Alternatives and similar repositories for FlashWindowAttention
Users that are interested in FlashWindowAttention are comparing it to the libraries listed below
Sorting:
- A library for calculating the FLOPs in the forward() process based on torch.fx☆124Updated 4 months ago
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆326Updated 7 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆352Updated this week
- Fast Multi-dimensional Sparse Attention☆586Updated 3 weeks ago
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆323Updated 5 months ago
- ☆293Updated 3 months ago
- [ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"☆373Updated 7 months ago
- An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…☆92Updated last year
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆215Updated 2 years ago
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆311Updated 4 months ago
- ☆182Updated 10 months ago
- A simple minimal implementation of Reversible Vision Transformers☆125Updated last year
- ☆51Updated last year
- Causal depthwise conv1d in CUDA, with a PyTorch interface☆541Updated 3 weeks ago
- [ECCV 2024] Isomorphic Pruning for Vision Models☆73Updated last year
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆194Updated 4 months ago
- This repository contains the experimental PyTorch native float8 training UX☆224Updated last year
- Patch convolution to avoid large GPU memory usage of Conv2D☆92Updated 6 months ago
- [ICLR 2023] "More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity"; [ICML 2023] "Are Large Kernels Better Teachers…☆275Updated 2 years ago
- 1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundatio…☆222Updated 11 months ago
- ☆292Updated 7 months ago
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆64Updated last year
- Fast Hadamard transform in CUDA, with a PyTorch interface☆215Updated last year
- [CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer☆73Updated last year
- Implementation of a memory efficient multi-head attention as proposed in the paper, "Self-attention Does Not Need O(n²) Memory"☆379Updated 2 years ago
- A library for unit scaling in PyTorch☆128Updated last month
- Recent Advances on Efficient Vision Transformers☆52Updated 2 years ago
- A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.☆567Updated this week
- The official implementation of TPA: Tensor ProducT ATTenTion Transformer (T6) (https://arxiv.org/abs/2501.06425)☆380Updated this week
- (Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …☆175Updated last year