zzd1992 / FlashWindowAttentionLinks
Speedup the attention computation of Swin Transformer
☆25Updated 5 months ago
Alternatives and similar repositories for FlashWindowAttention
Users that are interested in FlashWindowAttention are comparing it to the libraries listed below
Sorting:
- Triton implement of bi-directional (non-causal) linear attention☆56Updated 9 months ago
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆134Updated last month
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆216Updated 2 years ago
- ☆186Updated last year
- [ICLR 2025] Official PyTorch Implementation of Gated Delta Networks: Improving Mamba2 with Delta Rule☆367Updated 2 months ago
- When it comes to optimizers, it's always better to be safe than sorry☆378Updated last month
- A library for calculating the FLOPs in the forward() process based on torch.fx☆130Updated 7 months ago
- Patch convolution to avoid large GPU memory usage of Conv2D☆93Updated 9 months ago
- An efficient pytorch implementation of selective scan in one file, works with both cpu and gpu, with corresponding mathematical derivatio…☆97Updated last month
- Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models☆333Updated 8 months ago
- FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores☆331Updated 10 months ago
- A block oriented training approach for inference time optimization.☆33Updated last year
- A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch☆114Updated 2 months ago
- ☆302Updated 6 months ago
- ☆158Updated 2 years ago
- Tiled Flash Linear Attention library for fast and efficient mLSTM Kernels.☆74Updated last week
- A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…☆106Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizer☆206Updated 5 months ago
- Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch☆94Updated 8 months ago
- Get down and dirty with FlashAttention2.0 in pytorch, plug in and play no complex CUDA kernels☆109Updated 2 years ago
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆66Updated last year
- Root Mean Square Layer Normalization☆256Updated 2 years ago
- Batch computation of the linear assignment problem on GPU.☆96Updated 2 months ago
- 🔥 A minimal training framework for scaling FLA models☆291Updated 2 months ago
- ☆199Updated last year
- This repository contains the experimental PyTorch native float8 training UX☆223Updated last year
- OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM☆48Updated last year
- Official implementation of "Hydra: Bidirectional State Space Models Through Generalized Matrix Mixers"☆165Updated 9 months ago
- Fast Multi-dimensional Sparse Attention☆654Updated 3 weeks ago
- Implementation of the proposed MaskBit from Bytedance AI☆82Updated last year