utsaslab / MONeT
MONeT framework for reducing memory consumption of DNN training
☆172Updated 3 years ago
Related projects: ⓘ
- PyTorch implementation of L2L execution algorithm☆107Updated last year
- Slicing a PyTorch Tensor Into Parallel Shards☆295Updated 3 years ago
- Block-sparse primitives for PyTorch☆147Updated 3 years ago
- ☆107Updated 3 years ago
- PyProf2: PyTorch Profiling tool☆83Updated 4 years ago
- Distributed, mixed-precision training with PyTorch☆90Updated 4 years ago
- Train ImageNet in 18 minutes on AWS☆126Updated 6 months ago
- 🏙 Interactive in-editor performance profiling, visualization, and debugging for PyTorch neural networks.☆30Updated last year
- Training neural networks in TensorFlow 2.0 with 5x less memory☆127Updated 2 years ago
- ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training☆196Updated last year
- End-to-end training of sparse deep neural networks with little-to-no performance loss.☆315Updated last year
- Programmable Neural Network Compression☆146Updated 2 years ago
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …☆61Updated 2 years ago
- PyTorch layer-by-layer model profiler☆608Updated 3 years ago
- "Layer-wise Adaptive Rate Scaling" in PyTorch☆85Updated 3 years ago
- Labels and other data for the paper "Are we done with ImageNet?"☆181Updated 2 years ago
- ☆56Updated 2 years ago
- Example code showing how to use Nvidia DALI in pytorch, with fallback to torchvision. Contains a few differences to the official Nvidia …☆195Updated 4 years ago
- Using ideas from product quantization for state-of-the-art neural network compression.☆145Updated 3 years ago
- Torch Distributed Experimental☆115Updated last month
- [Prototype] Tools for the concurrent manipulation of variably sized Tensors.☆252Updated last year
- Estimate/count FLOPS for a given neural network using pytorch☆303Updated 2 years ago
- A GPU performance profiling tool for PyTorch models☆493Updated 3 years ago
- Implementation of a Transformer, but completely in Triton☆242Updated 2 years ago
- Seamless analysis of your PyTorch models (RAM usage, FLOPs, MACs, receptive field, etc.)☆207Updated this week
- Demystify RAM Usage in Multi-Process Data Loaders☆171Updated last year
- Unofficial PyTorch Implementation of EvoNorm☆121Updated 3 years ago
- ☆164Updated 5 years ago
- ☆61Updated 4 years ago
- [ICLR 2020] Drawing Early-Bird Tickets: Toward More Efficient Training of Deep Networks☆137Updated 3 years ago