VictorZuanazzi / AdaptBatch
Basic code for adaptive batch in pytorch
☆18Updated 5 years ago
Alternatives and similar repositories for AdaptBatch:
Users that are interested in AdaptBatch are comparing it to the libraries listed below
- High performance pytorch modules☆18Updated 2 years ago
- ☆14Updated 5 years ago
- custom cuda kernel for {2, 3}d relative attention with pytorch wrapper☆43Updated 4 years ago
- Official Pytorch Implementation of Length-Adaptive Transformer (ACL 2021)☆101Updated 4 years ago
- Implements the SM3-II adaptive optimization algorithm for PyTorch.☆33Updated 4 months ago
- A collection of Models, Datasets, DataModules, Callbacks, Metrics, Losses and Loggers to better integrate pytorch-lightning with transfor…☆47Updated last year
- Code publication to the paper "Normalized Attention Without Probability Cage"☆16Updated 3 years ago
- The official repository for our paper "The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers". We s…☆66Updated 2 years ago
- PyTorch implementation of HashedNets☆36Updated last year
- Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch☆72Updated 2 years ago
- ☆47Updated 4 years ago
- Models and code from Learning to Predict Denotational Probabilities For Modeling Entailment☆14Updated 6 years ago
- ☆24Updated 8 months ago
- Humans understand novel sentences by composing meanings and roles of core language components. In contrast, neural network models for nat…☆27Updated 4 years ago
- AdamW optimizer for bfloat16 models in pytorch 🔥.☆31Updated 7 months ago
- MTAdam: Automatic Balancing of Multiple Training Loss Terms☆36Updated 4 years ago
- Code for the ICML'20 paper "Improving Transformer Optimization Through Better Initialization"☆89Updated 3 years ago
- Code for SegTree Transformer (ICLR-RLGM 2019).☆27Updated 5 years ago
- Code for paper 'Minimizing FLOPs to Learn Efficient Sparse Representations' published at ICLR 2020☆20Updated 4 years ago
- [ICLR 2019] Learning Representations of Sets through Optimized Permutations☆36Updated 5 years ago
- Pretrained TorchVision models on CIFAR10 dataset (with weights)☆24Updated 4 years ago
- Code for the paper PermuteFormer☆42Updated 3 years ago
- c++ mosestokenizer☆16Updated 10 months ago
- Code for reversible recurrent neural networks☆38Updated 5 years ago
- SparseMax activation function implementation (ICML 2016) (PyTorch)☆27Updated 7 years ago
- Code repo for "Transformer on a Diet" paper☆31Updated 4 years ago
- This repository contains example code to build models on TPUs☆30Updated last year
- PyTorch implementation of the NIPS'17 paper Training Deep Networks without Learning Rates Through Coin Betting.☆38Updated 6 years ago