eknight7 / ParallelRNN
Final Project for Parallel Computing at CMU (15-618/15-418)
☆10Updated 8 years ago
Alternatives and similar repositories for ParallelRNN:
Users that are interested in ParallelRNN are comparing it to the libraries listed below
- a heterogeneous multiGPU level-3 BLAS library☆45Updated 5 years ago
- Benchmarking matrix multiplication implementations☆98Updated 8 years ago
- ☆75Updated last year
- Python wrappers for the NVIDIA cuDNN libraries☆140Updated 7 years ago
- Proof-of-Concept CNN in Halide☆22Updated 8 years ago
- GPU implementation of classical molecular dynamics proxy application.☆31Updated 8 years ago
- Caffe deep learning framework - optimized for Xeon Phi☆14Updated 9 years ago
- GPU Optimization and Memory Abstraction Framework☆32Updated 5 years ago
- A GPU-based LZSS compression algorithm, highly tuned for NVIDIA GPGPUs and for streaming data, leveraging the respective strengths of CPU…☆35Updated 9 years ago
- CNNs in Halide☆23Updated 9 years ago
- LonestarGPU: Irregular algorithms parallelized for GPUs☆33Updated 5 years ago
- SRS - Fast Approximate Nearest Neighbor Search in High Dimensional Euclidean Space With a Tiny Index☆55Updated 9 years ago
- Deep neural network framework (C/C++/CUDA).☆31Updated 9 years ago
- Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL☆135Updated 7 years ago
- A fast and highly scalable GPU dynamic memory allocator☆104Updated 9 years ago
- A CUDA implementation of the PageRank Pipeline Benchmark☆32Updated 8 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆70Updated 8 years ago
- Fork of magma to include more BLAS☆28Updated 8 years ago
- Communication-Minimizing 2D Convolution in GPU Registers☆30Updated 11 years ago
- Boda: A C++ Framework for Efficient Experiments in Computer Vision☆63Updated 5 years ago
- Frog is Asynchronous Graph Processing on GPU with Hybrid Coloring Model. The fundamental idea is based on Pareto principle (or 80-20 rule…☆36Updated 3 years ago
- TTC: A high-performance Compiler for Tensor Transpositions☆20Updated 7 years ago
- ☆10Updated 2 years ago
- Convolutional neural networks C++ framework with CPU and GPU (CUDA) backends☆178Updated 6 years ago
- ☆40Updated 7 years ago
- A GPU cache model for research purposes☆28Updated 11 years ago
- Full-speed Array of Structures access☆164Updated last year
- Intel Heterogeneous Research Compiler (iHRC)☆25Updated 2 years ago
- A portable high-level API with CUDA or OpenCL back-end☆54Updated 7 years ago
- Code experiments to exercise ideas while reading "Engineering a Compiler".☆27Updated 5 years ago