pseeth / autoclip
Adaptive Gradient Clipping
☆126Updated 2 years ago
Alternatives and similar repositories for autoclip:
Users that are interested in autoclip are comparing it to the libraries listed below
- ☆163Updated 2 years ago
- TF/Keras code for DiffStride, a pooling layer with learnable strides.☆124Updated 3 years ago
- Code repository for the ICLR 2022 paper "FlexConv: Continuous Kernel Convolutions With Differentiable Kernel Sizes" https://openreview.ne…☆115Updated 2 years ago
- Drop-in replacement for any ResNet with a significantly reduced memory footprint and better representation capabilities☆209Updated 9 months ago
- Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch☆251Updated 2 years ago
- Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning☆160Updated last year
- Implementation of Nyström Self-attention, from the paper Nyströmformer☆127Updated last year
- Code repository of the paper "CKConv: Continuous Kernel Convolution For Sequential Data" published at ICLR 2022. https://arxiv.org/abs/21…☆119Updated 2 years ago
- PyTorch dataset extended with map, cache etc. (tensorflow.data like)☆328Updated 2 years ago
- Collection of the latest, greatest, deep learning optimizers (for Pytorch) - CNN, NLP suitable☆211Updated 3 years ago
- Implementation of a U-net complete with efficient attention as well as the latest research findings☆273Updated 9 months ago
- Ranger deep learning optimizer rewrite to use newest components☆328Updated last year
- Implementation of fused cosine similarity attention in the same style as Flash Attention☆210Updated 2 years ago
- Simple and efficient RevNet-Library for PyTorch with XLA and DeepSpeed support and parameter offload☆126Updated 2 years ago
- An alternative to convolution in neural networks☆254Updated 10 months ago
- ☆47Updated 4 years ago
- Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).☆225Updated 2 years ago
- Is the attention layer even necessary? (https://arxiv.org/abs/2105.02723)☆483Updated 3 years ago
- A practical implementation of GradNorm, Gradient Normalization for Adaptive Loss Balancing, in Pytorch☆85Updated last year
- Implementation of Feedback Transformer in Pytorch☆105Updated 3 years ago
- Sequence Modeling with Structured State Spaces☆62Updated 2 years ago
- Easy-to-use AdaHessian optimizer (PyTorch)☆77Updated 4 years ago
- Collection of PyTorch Lightning implementations of Generative Adversarial Network varieties presented in research papers.☆166Updated 2 years ago
- Configuration classes enabling type-safe PyTorch configuration for Hydra apps☆209Updated 2 years ago
- Relative Positional Encoding for Transformers with Linear Complexity☆62Updated 2 years ago
- Traditional Machine Learning Models for Large-Scale Datasets in PyTorch.☆124Updated last week
- Sinkhorn Transformer - Practical implementation of Sparse Sinkhorn Attention☆257Updated 3 years ago
- Implementation of Linformer for Pytorch☆266Updated last year
- Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms☆258Updated 3 years ago
- Implementation of Flow++ in PyTorch☆41Updated 5 years ago