eknight7 / ParallelRNNLinks
Final Project for Parallel Computing at CMU (15-618/15-418)
☆10Updated 9 years ago
Alternatives and similar repositories for ParallelRNN
Users that are interested in ParallelRNN are comparing it to the libraries listed below
Sorting:
- Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL☆137Updated 8 years ago
- Benchmarking matrix multiplication implementations☆103Updated 9 years ago
- CNNs in Halide☆23Updated 10 years ago
- ☆74Updated 2 years ago
- CL Offline Compiler : Compile OpenCL kernels to HSAIL☆50Updated 8 years ago
- GPU-based large scale Approx. Nearest Neighbor Search, accepted at CVPR 2016☆92Updated 7 years ago
- Compiler toolkit for neuFlow.☆26Updated 12 years ago
- Convolutional neural networks C++ framework with CPU and GPU (CUDA) backends☆182Updated 7 years ago
- A domain-specific language and compiler for image processing☆77Updated 4 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆21Updated 8 years ago
- A fast and highly scalable GPU dynamic memory allocator☆112Updated 10 years ago
- Boda: A C++ Framework for Efficient Experiments in Computer Vision☆64Updated 6 years ago
- Open single and half precision gemm implementations☆398Updated 2 years ago
- Proof-of-Concept CNN in Halide☆22Updated 9 years ago
- ☆101Updated 6 years ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆38Updated 10 years ago
- A CUDA implementation of the k-means clustering algorithm☆255Updated 13 years ago
- Easy to run kernels using OpenCL☆187Updated 9 months ago
- Generic system-wide modern C++ for heterogeneous platforms with SYCL from Khronos Group☆78Updated 5 years ago
- Rigel is a language for describing image processing hardware embedded in Lua. Rigel can compile to Verilog hardware designs for Xilinx FP…☆57Updated 5 years ago
- Introduction to Parallel Programming class code☆30Updated 10 years ago
- Fast matrix multiplication☆31Updated 4 years ago
- a heterogeneous multiGPU level-3 BLAS library☆46Updated 6 years ago
- HSAIL LLVM Tree - Development has stopped on this branch This was a development branch☆16Updated 9 years ago
- Communication-Minimizing 2D Convolution in GPU Registers☆30Updated 12 years ago
- Facebook's CUDA extensions.☆284Updated 6 years ago
- Parallel Algorithm Scheduling Library☆105Updated 8 years ago
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 8 years ago
- Mirror JPEG compression and decompression accelerated on GPU☆82Updated 11 years ago
- Full-speed Array of Structures access☆176Updated 2 years ago