lucidrains / minGRU-pytorchLinks
Implementation of the proposed minGRU in Pytorch
☆296Updated 2 months ago
Alternatives and similar repositories for minGRU-pytorch
Users that are interested in minGRU-pytorch are comparing it to the libraries listed below
Sorting:
- Simple, minimal implementation of the Mamba SSM in one pytorch file. Using logcumsumexp (Heisen sequence).☆117Updated 7 months ago
- Attempt to make multiple residual streams from Bytedance's Hyper-Connections paper accessible to the public☆83Updated 3 months ago
- ☆290Updated 4 months ago
- A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch☆94Updated last year
- When it comes to optimizers, it's always better to be safe than sorry☆233Updated 2 months ago
- Pytorch implementation of Simplified Structured State-Spaces for Sequence Modeling (S5)☆74Updated last year
- Quick implementation of nGPT, learning entirely on the hypersphere, from NvidiaAI☆282Updated 2 months ago
- Pytorch implementation of the xLSTM model by Beck et al. (2024)☆165Updated 9 months ago
- A State-Space Model with Rational Transfer Function Representation.☆78Updated last year
- An implementation of local windowed attention for language modeling☆450Updated 4 months ago
- Implementation of Rotary Embeddings, from the Roformer paper, in Pytorch☆681Updated 6 months ago
- Cuda implementation of Extended Long Short Term Memory (xLSTM) with C++ and PyTorch ports☆87Updated 11 months ago
- Kolmogorov–Arnold Networks with modified activation (using MLP to represent the activation)☆105Updated 7 months ago
- Implementation of https://srush.github.io/annotated-s4☆495Updated 2 years ago
- This is the official code release for Bayesian Flow Networks.☆275Updated 10 months ago
- PyTorch Implementation of Jamba: "Jamba: A Hybrid Transformer-Mamba Language Model"☆169Updated last month
- PyTorch implementation of Structured State Space for Sequence Modeling (S4), based on Annotated S4.☆81Updated last year
- Official JAX implementation of xLSTM including fast and efficient training and inference code. 7B model available at https://huggingface.…☆91Updated 4 months ago
- A simple but robust PyTorch implementation of RetNet from "Retentive Network: A Successor to Transformer for Large Language Models" (http…☆105Updated last year
- Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch☆104Updated 6 months ago
- Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"☆100Updated 5 months ago
- ☆185Updated 6 months ago
- Implementation of the sparse attention pattern proposed by the Deepseek team in their "Native Sparse Attention" paper☆637Updated 2 weeks ago
- Training small GPT-2 style models using Kolmogorov-Arnold networks.☆117Updated last year
- Official Implementation of "ADOPT: Modified Adam Can Converge with Any β2 with the Optimal Rate"☆423Updated 5 months ago
- [ICLR2025 Spotlight🔥] Official Implementation of TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters☆559Updated 3 months ago
- ☆163Updated 2 years ago
- Efficient Python library for Extended LSTM with exponential gating, memory mixing, and matrix memory for superior sequence modeling.☆291Updated 11 months ago
- Official implementation of the paper: "ZClip: Adaptive Spike Mitigation for LLM Pre-Training".☆124Updated last week
- Sequence Modeling with Multiresolution Convolutional Memory (ICML 2023)☆124Updated last year