utsaslab / MONeT
MONeT framework for reducing memory consumption of DNN training
☆174Updated 3 years ago
Alternatives and similar repositories for MONeT:
Users that are interested in MONeT are comparing it to the libraries listed below
- PyTorch implementation of L2L execution algorithm☆107Updated 2 years ago
- [Prototype] Tools for the concurrent manipulation of variably sized Tensors.☆253Updated 2 years ago
- ☆109Updated 3 years ago
- Distributed, mixed-precision training with PyTorch☆89Updated 4 years ago
- Lightweight and Parallel Deep Learning Framework☆264Updated 2 years ago
- On Network Design Spaces for Visual Recognition☆94Updated 4 years ago
- End-to-end training of sparse deep neural networks with little-to-no performance loss.☆319Updated 2 years ago
- Training neural networks in TensorFlow 2.0 with 5x less memory☆130Updated 3 years ago
- Example code showing how to use Nvidia DALI in pytorch, with fallback to torchvision. Contains a few differences to the official Nvidia …☆197Updated 5 years ago
- PyProf2: PyTorch Profiling tool☆82Updated 4 years ago
- Programmable Neural Network Compression☆148Updated 2 years ago
- [ICLR 2020] Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks☆137Updated 4 years ago
- PyTorch layer-by-layer model profiler☆606Updated 3 years ago
- Using ideas from product quantization for state-of-the-art neural network compression.☆146Updated 3 years ago
- "Layer-wise Adaptive Rate Scaling" in PyTorch☆86Updated 4 years ago
- Train ImageNet in 18 minutes on AWS☆129Updated last year
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆64Updated 3 years ago
- Block-sparse primitives for PyTorch☆154Updated 3 years ago
- Slicing a PyTorch Tensor Into Parallel Shards☆298Updated 3 years ago
- ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training☆201Updated 2 years ago
- Dynamic Tensor Rematerialization prototype (modified PyTorch) and simulator. Paper: https://arxiv.org/abs/2006.09616☆133Updated last year
- Implementation of a Transformer, but completely in Triton☆260Updated 2 years ago
- A GPU performance profiling tool for PyTorch models☆505Updated 3 years ago
- ☆57Updated 2 years ago
- Customized matrix multiplication kernels☆53Updated 3 years ago
- ☆40Updated 3 years ago
- ☆182Updated 2 years ago
- Estimate/count FLOPS for a given neural network using pytorch☆303Updated 2 years ago
- A tensor-aware point-to-point communication primitive for machine learning☆256Updated 2 years ago
- Experimental ground for optimizing memory of pytorch models☆366Updated 6 years ago