kashif / cuda-workshop
Code examples for the CUDA workshop
☆36Updated 2 years ago
Alternatives and similar repositories for cuda-workshop:
Users that are interested in cuda-workshop are comparing it to the libraries listed below
- Generating Families of Practical Fast Matrix Multiplication Algorithms☆12Updated 7 years ago
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 5 years ago
- PLEASE SEE THE OFFICIAL REPOSITORY. THIS IS NOT MAINTAINED ANYMORE.☆93Updated 5 years ago
- Simple example of implementing a new Tensorflow operation and its gradient in C++.☆56Updated 5 years ago
- ☆23Updated 5 years ago
- FluidNet re-written with ATen tensor lib☆51Updated 5 years ago
- FastHOG library that has been fixed to work with CUDA 5.x on Ubuntu 12.04☆20Updated 11 years ago
- kmeans clustering with multi-GPU capabilities☆118Updated last year
- CNNs in Halide☆23Updated 9 years ago
- Python Framework for sparse neural networks☆19Updated 7 years ago
- Fork of magma to include more BLAS☆28Updated 8 years ago
- RSVDPACK: Implementations of fast algorithms for computing the low rank SVD, interpolative and CUR decompositions of a matrix, using ran…☆88Updated 2 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 7 years ago
- This example builds on the parallel-forall repo separate compilation example by adding CMake to it.☆17Updated 7 years ago
- A CUDA implementation of the PageRank Pipeline Benchmark☆32Updated 8 years ago
- sparse matrix pre-processing library☆81Updated 9 months ago
- GPU implementation of classical molecular dynamics proxy application.☆31Updated 8 years ago
- Parallel network flows using OpenMP and CUDA.☆27Updated 6 years ago
- This is a cross-platform, CUDA-based C++ library for general-purpose, unconstrained nonlinear optimization on the GPU. It implements the …☆134Updated 4 years ago
- Efficient LDA solution on GPUs.☆24Updated 6 years ago
- ☆42Updated 7 years ago
- ☆13Updated 7 years ago
- Example code to create and train a Pytorch model using the new C++ frontend.☆17Updated 5 years ago
- Test winograd convolution written in TVM for CUDA and AMDGPU☆40Updated 6 years ago
- Corrected source for the OpenCL in Action book (work in progress)☆62Updated 11 years ago
- A GPU / CPU implementation of a feed forward neural network☆32Updated 9 years ago
- kmeans☆54Updated 8 years ago
- Deep neural network framework (C/C++/CUDA).☆31Updated 9 years ago
- BLAS OpenCL implementation.☆15Updated 9 years ago