aramadia / udacity-cs344
Parallel Programming
☆28Updated 11 years ago
Alternatives and similar repositories for udacity-cs344:
Users that are interested in udacity-cs344 are comparing it to the libraries listed below
- Online CUDA Occupancy Calculator☆75Updated 3 years ago
- Codebase associated with the PyTorch compiler tutorial☆46Updated 5 years ago
- this is the release repository of superneurons☆52Updated 4 years ago
- A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory☆297Updated 6 years ago
- kmeans clustering with multi-GPU capabilities☆120Updated last year
- Kernel Fusion and Runtime Compilation Based on NNVM☆70Updated 8 years ago
- An analytical performance modeling tool for deep neural networks.☆88Updated 4 years ago
- sparse matrix pre-processing library☆82Updated 10 months ago
- Python bindings for NVTX☆66Updated last year
- Some CUDA design patterns and a bit of template magic for CUDA☆150Updated last year
- This is a tuned sparse matrix dense vector multiplication(SpMV) library☆21Updated 9 years ago
- GPU-specialized parameter server for GPU machine learning.☆101Updated 6 years ago
- Medusa: Building GPU-based Parallel Sparse Graph Applications with Sequential C/C++ Code☆61Updated 4 years ago
- Repository for SysML19 Artifacts Evaluation☆53Updated 6 years ago
- Convert nvprof profiles into about:tracing compatible JSON files☆69Updated 3 years ago
- CUDA Tensor Transpose (cuTT) library☆51Updated 7 years ago
- A CUDNN minimal deep learning training code sample using LeNet.☆264Updated last year
- High-performance, GPU-aware communication library☆85Updated 2 months ago
- ☆8Updated last year
- ☆91Updated 8 years ago
- matrix multiplication in CUDA☆122Updated last year
- Code for paper "Design Principles for Sparse Matrix Multiplication on the GPU" accepted to Euro-Par 2018☆72Updated 4 years ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆36Updated 7 years ago
- Sparse matrix computation library for GPU☆54Updated 4 years ago
- Third party assembler and GEMM library for NVIDIA Kepler GPU☆81Updated 5 years ago
- CUDA Data Parallel Primitives Library☆427Updated 6 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- Test winograd convolution written in TVM for CUDA and AMDGPU☆41Updated 6 years ago
- TVM stack: exploring the incredible explosion of deep-learning frameworks and how to bring them together☆64Updated 6 years ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 5 years ago