kaiyuyue / torchshard
Slicing a PyTorch Tensor Into Parallel Shards
☆296Updated 3 years ago
Related projects ⓘ
Alternatives and complementary repositories for torchshard
- Accelerate training by storing parameters in one contiguous chunk of memory.☆291Updated 4 years ago
- A GPU performance profiling tool for PyTorch models☆495Updated 3 years ago
- Demystify RAM Usage in Multi-Process Data Loaders☆180Updated last year
- Implementation of a Transformer, but completely in Triton☆249Updated 2 years ago
- Implementation of https://arxiv.org/abs/1904.00962☆369Updated 3 years ago
- Library for 8-bit optimizers and quantization routines.☆714Updated 2 years ago
- ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training☆201Updated last year
- A LARS implementation in PyTorch☆335Updated 4 years ago
- PyTorch layer-by-layer model profiler☆608Updated 3 years ago
- A general and accurate MACs / FLOPs profiler for PyTorch models☆571Updated 6 months ago
- [Prototype] Tools for the concurrent manipulation of variably sized Tensors.☆253Updated 2 years ago
- Running BERT without Padding☆460Updated 2 years ago
- Official code for "Writing Distributed Applications with PyTorch", PyTorch Tutorial☆255Updated last year
- ☆107Updated 3 years ago
- Accelerate PyTorch models with ONNX Runtime☆356Updated 2 months ago
- Block-sparse primitives for PyTorch☆148Updated 3 years ago
- MONeT framework for reducing memory consumption of DNN training☆173Updated 3 years ago
- TVM integration into PyTorch☆452Updated 4 years ago
- A GPipe implementation in PyTorch☆818Updated 3 months ago
- Torch Distributed Experimental☆116Updated 3 months ago
- Fast Block Sparse Matrices for Pytorch☆545Updated 3 years ago
- Efficient, check-pointed data loading for deep learning with massive data sets.☆205Updated last year
- Memory Efficient Attention (O(sqrt(n)) for Jax and PyTorch☆179Updated last year
- A library to analyze PyTorch traces.☆307Updated this week
- A tensor-aware point-to-point communication primitive for machine learning☆249Updated last year
- torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…☆175Updated last week
- Tutel MoE: An Optimized Mixture-of-Experts Implementation☆735Updated this week
- This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as …☆193Updated 2 years ago
- TorchX is a universal job launcher for PyTorch applications. TorchX is designed to have fast iteration time for training/research and sup…☆332Updated this week
- Pipeline Parallelism for PyTorch☆726Updated 2 months ago