tczhangzhi / pytorch-parallelLinks
Optimize an example model with Python, CPP, and CUDA extensions and Ring-Allreduce.
☆109Updated 6 years ago
Alternatives and similar repositories for pytorch-parallel
Users that are interested in pytorch-parallel are comparing it to the libraries listed below
Sorting:
- A memory balanced and communication efficient FullyConnected layer with CrossEntropyLoss model parallel implementation in PyTorch☆84Updated 5 years ago
- How and why you want to make your pytorch CUDA/CPP extension with a Makefile☆172Updated 5 years ago
- Pytorch Implementation the paper Auto-DeepLab Hierarchical Neural Architecture Search for Semantic Image Segmentation☆411Updated 3 years ago
- pytorch源码阅读 0.2.0 版本☆90Updated 5 years ago
- Dynamic Tensor Rematerialization prototype (modified PyTorch) and simulator. Paper: https://arxiv.org/abs/2006.09616☆132Updated last year
- A bite of cpp api in PyTorch-1.0☆164Updated 5 years ago
- A sample for onnxparser working with trt user defined plugins for TRT7.0☆168Updated 4 years ago
- A super light-weight deep learning library based on NumPy in PyTorch fashion.☆94Updated 3 years ago
- Example repository for custom C++/CUDA operators for TorchScript☆114Updated 2 years ago
- A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.☆122Updated 6 years ago
- CUDA implementation of NMS for PyTorch☆87Updated 5 years ago
- Fast CUDA Kernels for ResNet Inference.☆174Updated 6 years ago
- A hyperparameter manager for deep learning experiments.☆96Updated 2 years ago
- deformable_conv2d layer implemented in pytorch☆63Updated 6 years ago
- Official code for "Writing Distributed Applications with PyTorch", PyTorch Tutorial☆262Updated 2 years ago
- Easy Panoptic Segmentation implement in mmdet☆11Updated 6 years ago
- Pytorch Implementationg of “Learning Efficient Convolutional Networks through Network Slimming”☆77Updated 6 years ago
- Example code showing how to use Nvidia DALI in pytorch, with fallback to torchvision. Contains a few differences to the official Nvidia …☆197Updated 5 years ago
- A Simple & Flexible Cross Framework Operators Toolkit☆164Updated 4 years ago
- Synchronized Multi-GPU Batch Normalization☆223Updated 6 years ago
- ☆129Updated 4 years ago
- Partial Order Pruning: for Best Speed/Accuracy Trade-off in Neural Architecture Search☆149Updated 4 years ago
- PyTorch Dataset Rank Dataset☆43Updated 4 years ago
- ☆169Updated 4 years ago
- Benchmark of TVM quantized model on CUDA☆111Updated 5 years ago
- A small deep-learning framework with C++/Python/CUDA☆54Updated 7 years ago
- Pytorch implementation of network design paradigm described in the paper "Designing Network Design Spaces"☆185Updated 11 months ago
- [ICCV 2019] Harmonious Bottleneck on Two Orthogonal Dimensions, surpassing MobileNetV2☆102Updated 5 years ago
- [MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration☆200Updated 3 years ago
- 分享计算机视觉每天的arXiv文章☆711Updated 5 years ago