Doraemonzzz / tnn-pytorch
☆18Updated last year
Related projects ⓘ
Alternatives and complementary repositories for tnn-pytorch
- Curse-of-memory phenomenon of RNNs in sequence modelling☆19Updated this week
- ☆31Updated 10 months ago
- [NeurIPS 2022] Your Transformer May Not be as Powerful as You Expect (official implementation)☆33Updated last year
- ☆25Updated 4 months ago
- ☆45Updated 4 months ago
- [NeurIPS 2023 spotlight] Official implementation of HGRN in our NeurIPS 2023 paper - Hierarchically Gated Recurrent Neural Network for Se…☆61Updated 6 months ago
- [ICLR 2023] Official implementation of Transnormer in our ICLR 2023 paper - Toeplitz Neural Network for Sequence Modeling☆74Updated 6 months ago
- Code for the PAPA paper☆27Updated 2 years ago
- ☆49Updated last year
- Blog post☆16Updated 9 months ago
- ☆21Updated last month
- [EMNLP 2023] Official implementation of the algorithm ETSC: Exact Toeplitz-to-SSM Conversion our EMNLP 2023 paper - Accelerating Toeplitz…☆14Updated last year
- ☆45Updated 9 months ago
- Efficient PScan implementation in PyTorch☆15Updated 10 months ago
- STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION☆16Updated 6 years ago
- ☆32Updated 3 years ago
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Updated 2 years ago
- HGRN2: Gated Linear RNNs with State Expansion☆49Updated 3 months ago
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆120Updated last year
- Implementation of Gated State Spaces, from the paper "Long Range Language Modeling via Gated State Spaces", in Pytorch☆95Updated last year
- Leveraging Recursive Gumbel-Max Trick for Approximate Inference in Combinatorial Spaces, NeurIPS 2021☆11Updated 2 years ago
- Fast and memory-efficient exact attention☆27Updated last week
- sigma-MoE layer☆18Updated 10 months ago
- Code for the paper "Stack Attention: Improving the Ability of Transformers to Model Hierarchical Patterns"☆16Updated 8 months ago
- Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆52Updated last month
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆35Updated 11 months ago
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated 5 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆44Updated last year
- This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Pr…☆24Updated 2 years ago
- Official code for "Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving", ICML 2021☆26Updated 3 years ago