lucidrains/gradnorm-pytorch

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/lucidrains/gradnorm-pytorch)

lucidrains / gradnorm-pytorch

A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch

☆133

Alternatives and similar repositories for gradnorm-pytorch

Users that are interested in gradnorm-pytorch are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lucidrains / kalman-filtering-attention
View on GitHub
Implementation of the Kalman Filtering Attention proposed in "Kalman Filtering Attention for User Behavior Modeling in CTR Prediction"
☆61Oct 22, 2023Updated 2 years ago
lucidrains / x-unet
View on GitHub
Implementation of a U-net complete with efficient attention as well as the latest research findings
☆294May 3, 2024Updated 2 years ago
lucidrains / llama-qrlhf
View on GitHub
Implementation of the Llama architecture with RLHF + Q-learning
☆170Feb 1, 2025Updated last year
lucidrains / metaformer-gpt
View on GitHub
Implementation of Metaformer, but in an autoregressive manner
☆26Jun 21, 2022Updated 4 years ago
lucidrains / gateloop-transformer
View on GitHub
Implementation of GateLoop Transformer in Pytorch and Jax
☆93Jun 18, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
lucidrains / frame-averaging-pytorch
View on GitHub
Pytorch implementation of a simple way to enable (Stochastic) Frame Averaging for any network
☆52Jul 26, 2024Updated 2 years ago
lucidrains / compositional-attention-pytorch
View on GitHub
Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…
☆51May 10, 2022Updated 4 years ago
lucidrains / mixture-of-attention
View on GitHub
Some personal experiments around routing tokens to different autoregressive attention, akin to mixture-of-experts
☆122Oct 17, 2024Updated last year
lucidrains / pytorch-custom-utils
View on GitHub
Just some miscellaneous utility functions / decorators / modules related to Pytorch and Accelerate to help speed up implementation of new…
☆126Jul 26, 2024Updated 2 years ago
lucidrains / grokfast-pytorch
View on GitHub
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
☆104Dec 22, 2024Updated last year
lucidrains / Adan-pytorch
View on GitHub
Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch
☆251Sep 1, 2022Updated 3 years ago
lucidrains / agent-attention-pytorch
View on GitHub
Implementation of Agent Attention in Pytorch
☆93Jul 10, 2024Updated 2 years ago
lucidrains / mirasol-pytorch
View on GitHub
Implementation of 🌻 Mirasol, SOTA Multimodal Autoregressive model out of Google Deepmind, in Pytorch
☆92Dec 22, 2023Updated 2 years ago
lucidrains / pause-transformer
View on GitHub
Yet another random morning idea to be quickly tried and architecture shared if it works; to allow the transformer to pause for any amount…
☆53Oct 22, 2023Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
exercise-book-yq / Supercodec
View on GitHub
☆51Mar 5, 2026Updated 4 months ago
lucidrains / mogrifier
View on GitHub
Usable implementation of Mogrifier, a circuit for enhancing LSTMs and potentially other networks, from Deepmind
☆21Jun 9, 2024Updated 2 years ago
lucidrains / product-key-memory
View on GitHub
Standalone Product Key Memory module in Pytorch - for augmenting Transformer models
☆87Nov 1, 2025Updated 8 months ago
lucidrains / holodeck-pytorch
View on GitHub
Implementation of a holodeck, written in Pytorch
☆19Nov 1, 2023Updated 2 years ago
lucidrains / bidirectional-cross-attention
View on GitHub
A simple cross attention that updates both the source and target in one step
☆197Jul 29, 2025Updated 11 months ago
lucidrains / local-attention-flax
View on GitHub
Local Attention - Flax module for Jax
☆22May 26, 2021Updated 5 years ago
lucidrains / st-moe-pytorch
View on GitHub
Implementation of ST-Moe, the latest incarnation of MoE after years of research at Brain, in Pytorch
☆386Jun 17, 2024Updated 2 years ago
lucidrains / esbn-transformer
View on GitHub
An attempt to merge ESBN with Transformers, to endow Transformers with the ability to emergently bind symbols
☆16Aug 3, 2021Updated 4 years ago
lucidrains / memory-compressed-attention
View on GitHub
Implementation of Memory-Compressed Attention, from the paper "Generating Wikipedia By Summarizing Long Sequences"
☆71Apr 10, 2023Updated 3 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
lucidrains / CALM-pytorch
View on GitHub
Implementation of CALM from the paper "LLM Augmented LLMs: Expanding Capabilities through Composition", out of Google Deepmind
☆177Sep 12, 2024Updated last year
lucidrains / logavgexp-torch
View on GitHub
Implementation of LogAvgExp for Pytorch
☆37Apr 10, 2025Updated last year
lucidrains / isab-pytorch
View on GitHub
An implementation of (Induced) Set Attention Block, from the Set Transformers paper
☆70Jun 8, 2026Updated last month
lucidrains / bit-diffusion
View on GitHub
Implementation of Bit Diffusion, Hinton's group's attempt at discrete denoising diffusion, in Pytorch
☆357Oct 14, 2023Updated 2 years ago
naver-ai / lut
View on GitHub
[ECCV 2024] Official PyTorch implementation of LUT "Learning with Unmasked Tokens Drives Stronger Vision Learners"
☆14Dec 1, 2024Updated last year
lucidrains / scaling-vin-pytorch
View on GitHub
Exploration into the Scaling Value Iteration Networks paper, from Schmidhuber's group
☆37Sep 23, 2024Updated last year
lucidrains / complex-valued-transformer
View on GitHub
Implementation of the transformer proposed in "Building Blocks for a Complex-Valued Transformer Architecture"
☆92Oct 13, 2023Updated 2 years ago
lucidrains / insertion-deletion-ddpm
View on GitHub
Implementation of Insertion-deletion Denoising Diffusion Probabilistic Models
☆30May 31, 2022Updated 4 years ago
lucidrains / tranception-pytorch
View on GitHub
Implementation of Tranception, an attention network, paired with retrieval, that is SOTA for protein fitness prediction
☆32Jun 19, 2022Updated 4 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
lucidrains / coordinate-descent-attention
View on GitHub
Implementation of an Attention layer where each head can attend to more than just one token, using coordinate descent to pick topk
☆47Jul 16, 2023Updated 3 years ago
lucidrains / evolutionary-design-molecules
View on GitHub
Implementation of the algorithm detailed in paper "Evolutionary design of molecules based on deep learning and a genetic algorithm"
☆24Dec 15, 2023Updated 2 years ago
lucidrains / Mega-pytorch
View on GitHub
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆207Aug 26, 2023Updated 2 years ago
lucidrains / firefly-torch
View on GitHub
Exploration into the Firefly algorithm in Pytorch
☆41Feb 14, 2025Updated last year
lucidrains / transformer-lm-gan
View on GitHub
Explorations into adversarial losses on top of autoregressive loss for language modeling
☆41Dec 21, 2025Updated 7 months ago
hananshafi / MedContext
View on GitHub
[MICCAI 2024] Official code for the paper "MedContext: Learning Contextual Cues for Efficient Volumetric Medical Segmentation"
☆14Nov 1, 2024Updated last year
deepvk / muse
View on GitHub
🎵 muse: Music Separation
☆11Feb 14, 2024Updated 2 years ago