pseeth / autoclipLinks

Adaptive Gradient Clipping

☆135

Alternatives and similar repositories for autoclip

Users that are interested in autoclip are comparing it to the libraries listed below

Sorting:

ctlllll / SGConv
☆163Updated 2 years ago
google-research / diffstride
TF/Keras code for DiffStride, a pooling layer with learnable strides.
☆124Updated 3 years ago
aliutkus / spe
Relative Positional Encoding for Transformers with Linear Complexity
☆64Updated 3 years ago
aced125 / sparsemax
A PyTorch Implementation of the Sparsemax operator (https://arxiv.org/pdf/1803.09820.pdf)
☆34Updated 2 years ago
rishikksh20 / FNet-pytorch
Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms
☆259Updated 4 years ago
lucidrains / linformer
Implementation of Linformer for Pytorch
☆290Updated last year
lucidrains / nystrom-attention
Implementation of Nyström Self-attention, from the paper Nyströmformer
☆135Updated 3 months ago
dwromero / ckconv
Code repository of the paper "CKConv: Continuous Kernel Convolution For Sequential Data" published at ICLR 2022. https://arxiv.org/abs/21…
☆121Updated 2 years ago
lucidrains / Adan-pytorch
Implementation of the Adan (ADAptive Nesterov momentum algorithm) Optimizer in Pytorch
☆252Updated 2 years ago
rjbruin / flexconv
Code repository for the ICLR 2022 paper "FlexConv: Continuous Kernel Convolutions With Differentiable Kernel Sizes" https://openreview.ne…
☆116Updated 2 years ago
NVIDIA / transformer-ls
Official PyTorch Implementation of Long-Short Transformer (NeurIPS 2021).
☆225Updated 3 years ago
lucidrains / axial-positional-embedding
Axial Positional Embedding for Pytorch
☆83Updated 4 months ago
Cerebras / online-normalization
Online Normalization for Training Neural Networks (Companion Repository)
☆82Updated 4 years ago
rish-16 / aft-pytorch
Unofficial PyTorch implementation of Attention Free Transformer (AFT) layers by Apple Inc.
☆238Updated 3 years ago
lucidrains / h-transformer-1d
Implementation of H-Transformer-1D, Hierarchical Attention for Sequence Learning
☆161Updated last year
lucidrains / Mega-pytorch
Implementation of Mega, the Single-head Attention with Multi-headed EMA architecture that currently holds SOTA on Long Range Arena
☆204Updated last year
guillaumeBellec / multitask
☆24Updated 8 months ago
facebookresearch / diffq
DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight …
☆235Updated 2 years ago
gcambara / cape
Continuous Augmented Positional Embeddings (CAPE) implementation for PyTorch
☆41Updated 2 years ago
lucidrains / flash-cosine-sim-attention
Implementation of fused cosine similarity attention in the same style as Flash Attention
☆214Updated 2 years ago
michaelsdr / momentumnet
Drop-in replacement for any ResNet with a significantly reduced memory footprint and better representation capabilities
☆209Updated last year
lessw2020 / Best-Deep-Learning-Optimizers
Collection of the latest, greatest, deep learning optimizers (for Pytorch) - CNN, NLP suitable
☆215Updated 4 years ago
nocotan / pytorch-lightning-gans
Collection of PyTorch Lightning implementations of Generative Adversarial Network varieties presented in research papers.
☆169Updated 3 months ago
lmnt-com / graftr
graftr: an interactive shell to view and edit PyTorch checkpoints.
☆113Updated 4 years ago
lucidrains / feedback-transformer-pytorch
Implementation of Feedback Transformer in Pytorch
☆107Updated 4 years ago
chrischute / flowplusplus
Implementation of Flow++ in PyTorch
☆40Updated 5 years ago
david-knigge / ccnn
Code repository of the paper "Modelling Long Range Dependencies in ND: From Task-Specific to a General Purpose CNN" https://arxiv.org/abs…
☆184Updated 2 months ago
borchero / pycave
Traditional Machine Learning Models for Large-Scale Datasets in PyTorch.
☆126Updated 3 weeks ago
lucidrains / local-attention
An implementation of local windowed attention for language modeling
☆460Updated 6 months ago
lucidrains / fast-transformer-pytorch
Implementation of Fast Transformer in Pytorch
☆175Updated 3 years ago