skolai / fewbit
Compression schema for gradients of activations in backward pass
☆44Updated last year
Related projects ⓘ
Alternatives and complementary repositories for fewbit
- FLOPs and other statistics COunter for Pytorch neural networks☆23Updated 3 years ago
- Learning to Initialize Neural Networks for Stable and Efficient Training☆136Updated 2 years ago
- ☆61Updated 4 years ago
- ☆20Updated 4 months ago
- PyTorch implementation of L2L execution algorithm☆106Updated last year
- MUSCO: MUlti-Stage COmpression of neural networks☆71Updated 3 years ago
- ☆77Updated 5 months ago
- A library for unit scaling in PyTorch☆105Updated 2 weeks ago
- Code for MSID, a Multi-Scale Intrinsic Distance for comparing generative models, studying neural networks, and more!☆50Updated 5 years ago
- Code for the paper "PALBERT: Teaching ALBERT to Ponder", NeurIPS 2022 Spotlight☆37Updated last year
- Customized matrix multiplication kernels☆53Updated 2 years ago
- Deep Generative Models course, 2021☆21Updated 2 years ago
- Lightweight knowledge distillation pipeline☆28Updated 2 years ago
- Code for the paper "Secure Distributed Training at Scale" (ICML 2022)☆14Updated 2 years ago
- Official code for "Distributed Deep Learning in Open Collaborations" (NeurIPS 2021)☆116Updated 2 years ago
- ☆11Updated 3 years ago
- Very simple and short implementation of gradient boosting in 18 lines of code☆9Updated 4 years ago
- Experiment of using Tangent to autodiff triton☆72Updated 9 months ago
- The official implementation of the ChordMixer architecture.☆59Updated last year
- ☆15Updated last year
- "Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts" (NeurIPS 2020), original PyTorch implemen…☆54Updated 4 years ago
- Fast Hadamard transform in CUDA, with a PyTorch interface☆111Updated 5 months ago
- The simplest but fast implementation of matrix multiplication in CUDA.☆33Updated 3 months ago
- Memory-efficient transformer. Work in progress.☆19Updated 2 years ago
- AdaShift optimizer implementation in PyTorch☆17Updated 5 years ago
- ☆51Updated 5 months ago
- ☆33Updated last year
- ☆11Updated 2 years ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆43Updated last year