skolai / fewbit
Compression schema for gradients of activations in backward pass
☆43Updated last year
Related projects: ⓘ
- FLOPs and other statistics COunter for Pytorch neural networks☆23Updated 3 years ago
- Learning to Initialize Neural Networks for Stable and Efficient Training☆134Updated 2 years ago
- ☆20Updated 2 months ago
- ☆60Updated 4 years ago
- Code for the paper "PALBERT: Teaching ALBERT to Ponder", NeurIPS 2022 Spotlight☆37Updated last year
- Deep Generative Models course, 2021☆20Updated 2 years ago
- ☆71Updated 3 weeks ago
- MUSCO: MUlti-Stage COmpression of neural networks☆73Updated 3 years ago
- A neural network training framework within a task-based parallel programming paradigm☆43Updated this week
- Code for MSID, a Multi-Scale Intrinsic Distance for comparing generative models, studying neural networks, and more!☆49Updated 5 years ago
- ☆35Updated 7 months ago
- PyTorch implementation of L2L execution algorithm☆107Updated last year
- The official implementation of the ChordMixer architecture.☆57Updated last year
- NLA 2018 Skoltech course☆51Updated 5 years ago
- Easy-to-use AdaHessian optimizer (PyTorch)☆77Updated 3 years ago
- ☆11Updated 3 years ago
- AdaShift optimizer implementation in PyTorch☆16Updated 5 years ago
- Code for the paper "Secure Distributed Training at Scale" (ICML 2022)☆14Updated 2 years ago
- Very simple and short implementation of gradient boosting in 18 lines of code☆9Updated 4 years ago
- ☆15Updated 2 weeks ago
- Code accompanying the NeurIPS 2020 paper: WoodFisher (Singh & Alistarh, 2020)☆45Updated 3 years ago
- Experiment of using Tangent to autodiff triton