WenqiJiang / Convolution-Neural-Network-by-pyCUDA
pyCUDA implementation of forward propagation for Convolutional Neural Networks
☆18Updated 6 years ago
Alternatives and similar repositories for Convolution-Neural-Network-by-pyCUDA
Users that are interested in Convolution-Neural-Network-by-pyCUDA are comparing it to the libraries listed below
Sorting:
- ☆14Updated 5 years ago
- Implementing CNN code in CUDA and OpenCL to evaluate its performance on NVIDIA GPUs, AMD GPUs, and an FPGA platform.☆54Updated 8 years ago
- Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )☆60Updated last month
- This is a tuned sparse matrix dense vector multiplication(SpMV) library☆21Updated 9 years ago
- A collection of awesome algorithms, implemented in CUDA.☆25Updated 7 years ago
- ☆14Updated last month
- Modified version of PyTorch able to work with changes to GPGPU-Sim☆51Updated 2 years ago
- Algorithms implemented in CUDA + resources about GPGPU☆56Updated 3 years ago
- GEMM and Winograd based convolutions using CUTLASS☆26Updated 4 years ago
- MAFIA: Multiple Application Framework for GPU architectures☆27Updated 3 years ago
- Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs☆16Updated 6 years ago
- This is a c++ implementation of an LSTM Neural Network parallelized for a GPU using CUDA☆25Updated 7 years ago
- Implementation of breadth first search on GPU with CUDA Driver API.☆50Updated 4 years ago
- Graph Transforms to Quantize and Retrain Deep Neural Nets in TensorFlow☆167Updated 5 years ago
- ICML2017 MEC: Memory-efficient Convolution for Deep Neural Network C++实现(非官方)☆17Updated 6 years ago
- Windows Visual Studio Solutions for class "Introduction to Parallel Programming"☆19Updated 6 years ago
- Sparse-dense matrix-matrix multiplication on GPUs☆14Updated 6 years ago
- cuDNN sample codes provided by Nvidia☆45Updated 6 years ago
- ☆17Updated 4 years ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆37Updated 7 years ago
- implementation of winograd minimal convolution algorithm on Intel Architecture☆39Updated 7 years ago
- Subpart source code of of deepcore v0.7☆27Updated 4 years ago
- Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm☆34Updated 5 years ago
- A Winograd Minimal Filter Implementation in CUDA☆24Updated 3 years ago
- Tensorflow quantization (float32-->int8) inference test☆74Updated 6 years ago
- HW/SW co-design of sentence-level energy optimizations for latency-aware multi-task NLP inference☆48Updated last year
- TVM learning and research☆13Updated 4 years ago
- ☆11Updated 4 years ago
- PyTorch implementation of Near-Lossless Post-Training Quantization of Deep Neural Networks via a Piecewise Linear Approximation☆21Updated 5 years ago
- PyTorch implementation of DiracDeltaNet from paper Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs☆31Updated 5 years ago