pkuyym / EvolvingAttention
☆14Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for EvolvingAttention
- PyTorch implementation of Pay Attention to MLPs☆39Updated 3 years ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆61Updated 6 months ago
- Unofficial PyTorch implementation of the paper "cosFormer: Rethinking Softmax In Attention".☆43Updated 3 years ago
- This repository contains source codes for SoftCTC. Original paper can be found here: https://arxiv.org/abs/2212.02135☆15Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆49Updated 2 months ago
- ☆21Updated last year
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆33Updated last year
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆59Updated 2 years ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆52Updated last month
- TriNet: stabilizing self-supervised learning from complete or slow collapse on ASR.☆26Updated last year
- A variant of Transformer-XL where the memory is updated not with a queue, but with attention☆46Updated 4 years ago
- Implementation of Insertion-deletion Denoising Diffusion Probabilistic Models☆30Updated 2 years ago
- Mixture of Attention Heads☆39Updated 2 years ago
- Relative Positional Encoding for Transformers with Linear Complexity☆61Updated 2 years ago
- Sparse Attention with Linear Units☆17Updated 3 years ago
- [ICLR 2022] "Audio Lottery: Speech Recognition Made Ultra-Lightweight, Noise-Robust, and Transferable", by Shaojin Ding, Tianlong Chen, Z…☆30Updated 2 years ago
- A repository for DenseSSMs☆88Updated 7 months ago
- Pytorch Implementation of the paper: "Learning to (Learn at Test Time): RNNs with Expressive Hidden States"☆23Updated last week
- Reference implementation of DecDTW in PyTorch (ICLR 2023)☆20Updated last year
- ☆19Updated last month
- [NeurIPS 2022] "Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Spee…☆15Updated last year
- Implementation of: Hydra Attention: Efficient Attention with Many Heads (https://arxiv.org/abs/2209.07484)☆11Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆44Updated last year
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- several types of attention modules written in PyTorch☆40Updated last month
- A replication of the paper "Adaptive Mixtures of Local Experts" applied to the CIFAR-10 image classification dataset.☆9Updated 3 years ago
- ResiDual: Transformer with Dual Residual Connections, https://arxiv.org/abs/2304.14802☆87Updated last year
- PyTorch implementation of FNet: Mixing Tokens with Fourier transforms☆25Updated 3 years ago
- ☆32Updated 3 years ago