zzd1992 / FlashWindowAttentionLinks
Speedup the attention computation of Swin Transformer
☆15Updated 4 months ago
Alternatives and similar repositories for FlashWindowAttention
Users that are interested in FlashWindowAttention are comparing it to the libraries listed below
Sorting:
- Implementation of Soft MoE, proposed by Brain's Vision team, in Pytorch☆294Updated 2 months ago
- An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…☆89Updated last year
- ☆34Updated last year
- ☆181Updated 8 months ago
- Official PyTorch Implementation of "The Hidden Attention of Mamba Models"☆222Updated last year
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆57Updated last year
- ☆286Updated last month
- ☆104Updated last year
- Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆94Updated this week
- A repository for DenseSSMs☆87Updated last year
- (Unofficial) PyTorch implementation of grouped-query attention (GQA) from "GQA: Training Generalized Multi-Query Transformer Models from …☆168Updated last year
- ☆50Updated last year
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆296Updated 3 months ago
- 1.5−3.0× lossless training or pre-training speedup. An off-the-shelf, easy-to-implement algorithm for the efficient training of foundatio…☆220Updated 9 months ago
- [CVPR'23] SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer☆69Updated last year
- Transformers w/o Attention, based fully on MLPs☆93Updated last year
- A simple minimal implementation of Reversible Vision Transformers☆125Updated last year
- A Triton Kernel for incorporating Bi-Directionality in Mamba2☆68Updated 5 months ago
- [ECCV 2024] Official PyTorch implementation of RoPE-ViT "Rotary Position Embedding for Vision Transformer"☆327Updated 5 months ago
- ☆28Updated 7 months ago
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 8 months ago
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆137Updated 4 months ago
- PyTorch code and checkpoints release for VanillaKD: https://arxiv.org/abs/2305.15781☆75Updated last year
- This repo contains the code and configuration files for reproducing object detection results of FocalNets with DINO☆67Updated 2 years ago
- [NeurIPS 2022 Spotlight] This is the official PyTorch implementation of "EcoFormer: Energy-Saving Attention with Linear Complexity"☆72Updated 2 years ago
- (CVPR 2022) Automated Progressive Learning for Efficient Training of Vision Transformers☆25Updated 3 months ago
- Official implementation of AAAI 2023 paper "Parameter-efficient Model Adaptation for Vision Transformers"☆105Updated last year
- ☆11Updated last year
- A library for calculating the FLOPs in the forward() process based on torch.fx☆113Updated 2 months ago
- [CVPR-22] This is the official implementation of the paper "Adavit: Adaptive vision transformers for efficient image recognition".☆54Updated 2 years ago