feifeibear / SWCaffe
A Deep Learning Framework customized for Sunway TaihuLight
☆40Updated 6 years ago
Alternatives and similar repositories for SWCaffe:
Users that are interested in SWCaffe are comparing it to the libraries listed below
- gossip: Efficient Communication Primitives for Multi-GPU Systems☆58Updated 2 years ago
- A highly efficient library for GEMM operations on Sunway TaihuLight☆17Updated 4 years ago
- this is the release repository of superneurons☆52Updated 4 years ago
- Automated machine learning as an AI-HPC benchmark☆65Updated 2 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆70Updated 8 years ago
- CUDA Tensor Transpose (cuTT) library☆51Updated 7 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆104Updated 7 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆80Updated 5 years ago
- RDMA and SHARP plugins for nccl library☆176Updated last month
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago
- High-performance, GPU-aware communication library☆84Updated last month
- CUPTI GPU Profiler☆37Updated 5 years ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆79Updated last year
- GVProf: A Value Profiler for GPU-based Clusters☆49Updated 10 months ago
- High performance NCCL plugin for Bagua.☆15Updated 3 years ago
- Subpart source code of of deepcore v0.7☆27Updated 4 years ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆117Updated 2 years ago
- ☆36Updated 2 months ago
- Dissecting NVIDIA GPU Architecture☆88Updated 2 years ago
- Prototype of OpenSHMEM for NVIDIA GPUs, developed as part of DoE Design Forward☆21Updated 6 years ago
- Pytorch process group third-party plugin for UCC☆20Updated 10 months ago
- Documentation for StreamExecutor open source proposal☆83Updated 8 years ago
- A tool for examining GPU scheduling behavior.