xiayuqing0622 / customized-flash-attention

Fast and memory-efficient exact attention

☆26

Related projects ⓘ

Alternatives and complementary repositories for customized-flash-attention

HazyResearch / prefix-linear-attention
☆44Updated 4 months ago
epfml / dynamic-sparse-flash-attention
☆132Updated last year
shawntan / stickbreaking-attention
Stick-breaking attention
☆33Updated this week
shreyansh26 / Attention-Mask-Patterns
Using FlexAttention to compute attention with different masking patterns
☆40Updated last month
microsoft / SparseMixer
Sparse Backpropagation for Mixture-of-Expert Training
☆22Updated 4 months ago
catie-aq / flashT5
A fast implementation of T5/UL2 in PyTorch using Flash Attention
☆67Updated 3 weeks ago
sjelassi / transformers_ssm_copy
☆24Updated 8 months ago
IST-DASLab / RoSA
☆34Updated 9 months ago
amirzandieh / HyperAttention
Triton Implementation of HyperAttention Algorithm
☆46Updated 11 months ago
kyegomez / Blockwise-Parallel-Transformer
32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.
☆42Updated last year
google-deepmind / randomized_positional_encodings
Randomized Positional Encodings Boost Length Generalization of Transformers
☆79Updated 7 months ago
VITA-Group / Random-MoE-as-Dropout
[ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…
☆44Updated last year
sanagno / adaptively_sparse_attention
☆17Updated last year
kyo-takano / chinchilla
A toolkit for scaling law research ⚖
☆42Updated 7 months ago
OpenNLPLab / HGRN
[NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…
☆61Updated 6 months ago
Leooyii / LCEG
Long Context Extension and Generalization in LLMs
☆39Updated last month
insuhan / hyper-attn
☆73Updated 11 months ago
ldery / Bonsai
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆28Updated 7 months ago
mgmalek / efficient_cross_entropy
☆76Updated 5 months ago
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated 5 months ago
berlino / gated_linear_attention
☆97Updated 8 months ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆56Updated 6 months ago
sustcsonglin / mamba-triton
☆45Updated 9 months ago
OpenNLPLab / Transnormer
[EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer
☆54Updated last year
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆67Updated last month
lucidrains / product-key-memory
Standalone Product Key Memory module in Pytorch - for augmenting Transformer models
☆72Updated 3 months ago
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆64Updated 5 months ago
OpenNLPLab / Tnn
[ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling
☆70Updated 6 months ago
ducdauge / sft-llm
Scaling Sparse Fine-Tuning to Large Language Models
☆17Updated 9 months ago
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆36Updated 11 months ago