kaiyuyue / torchshardLinks

Slicing a PyTorch Tensor Into Parallel Shards

☆301

Alternatives and similar repositories for torchshard

Users that are interested in torchshard are comparing it to the libraries listed below

Sorting:

PhilJd / contiguous_pytorch_params
Accelerate training by storing parameters in one contiguous chunk of memory.
☆293Updated 5 years ago
NVIDIA / PyProf
A GPU performance profiling tool for PyTorch models
☆509Updated 4 years ago
ppwwyyxx / RAM-multiprocess-dataloader
Demystify RAM Usage in Multi-Process Data Loaders
☆204Updated 2 years ago
ucbrise / actnn
ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training
☆199Updated 2 years ago
pytorch / nestedtensor
[Prototype] Tools for the concurrent manipulation of variably sized Tensors.
☆251Updated 3 years ago
utsaslab / MONeT
MONeT framework for reducing memory consumption of DNN training
☆174Updated 4 years ago
seba-1511 / dist_tuto.pth
Official code for "Writing Distributed Applications with PyTorch", PyTorch Tutorial
☆264Updated 2 years ago
awwong1 / torchprof
PyTorch layer-by-layer model profiler
☆608Updated 4 years ago
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆277Updated 3 years ago
cybertronai / pytorch-lamb
Implementation of https://arxiv.org/abs/1904.00962
☆377Updated 4 years ago
TezRomacH / layer-to-layer-pytorch
PyTorch implementation of L2L execution algorithm
☆109Updated 2 years ago
pytorch / torchdistx
Torch Distributed Experimental
☆117Updated last year
ptillet / torch-blocksparse
Block-sparse primitives for PyTorch
☆160Updated 4 years ago
prigoyal / pytorch_memonger
Experimental ground for optimizing memory of pytorch models
☆366Updated 7 years ago
microsoft / infinibatch
Efficient, check-pointed data loading for deep learning with massive data sets.
☆210Updated 2 years ago
kakaobrain / torchgpipe
A GPipe implementation in PyTorch
☆858Updated last year
NVlabs / tensorcom
☆108Updated 4 years ago
mit-han-lab / hardware-aware-transformers
[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
☆336Updated last year
zhijian-liu / torchprofile
A general and accurate MACs / FLOPs profiler for PyTorch models
☆631Updated 4 months ago
facebookresearch / bitsandbytes
Library for 8-bit optimizers and quantization routines.
☆779Updated 3 years ago
DeMoriarty / TorchPQ
Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda
☆229Updated last year
pytorch / tensorpipe
A tensor-aware point-to-point communication primitive for machine learning
☆275Updated 3 weeks ago
meta-pytorch / multipy
torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…
☆182Updated 3 months ago
mlcommons / training_results_v0.7
This repository contains the results and code for the MLPerf™ Training v0.7 benchmark.
☆57Updated 2 years ago
huggingface / pytorch_block_sparse
Fast Block Sparse Matrices for Pytorch
☆550Updated 4 years ago
justheuristic / prefetch_generator
Simple package that makes your generator work in background thread
☆282Updated 3 years ago
jianweif / OptimalGradCheckpointing
☆41Updated 4 years ago
pytorch / ort
Accelerate PyTorch models with ONNX Runtime
☆367Updated 9 months ago
snuspl / nimble
Lightweight and Parallel Deep Learning Framework
☆263Updated 3 years ago
meta-pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆161Updated 2 months ago