epfml / dynamic-sparse-flash-attentionLinks
☆150Updated 2 years ago
Alternatives and similar repositories for dynamic-sparse-flash-attention
Users that are interested in dynamic-sparse-flash-attention are comparing it to the libraries listed below
Sorting:
- ☆121Updated last year
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆243Updated 5 months ago
- Triton-based implementation of Sparse Mixture of Experts.☆253Updated 2 months ago
- ☆83Updated 2 years ago
- Fast and memory-efficient exact attention☆74Updated 9 months ago
- The simplest implementation of recent Sparse Attention patterns for efficient LLM inference.☆91Updated 4 months ago
- Triton implementation of FlashAttention2 that adds Custom Masks.☆151Updated last year
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆73Updated last year
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆121Updated last year
- Some preliminary explorations of Mamba's context scaling.☆217Updated last year
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training