SnailWalkerYC / LeNet-5_Speed_Up
Utilize OpenMP and CUDA to speed up LeNet-5 digit recognition CNN. In OpneMP, training with 11x speed up and 11x in testing. With the help of CUDA, the training is speed up by 3x and 57x speed up in testing.
☆7Updated 7 years ago
Related projects ⓘ
Alternatives and complementary repositories for LeNet-5_Speed_Up
- Course Webpage for CS 217 Hardware Accelerators for Machine Learning, Stanford University☆98Updated last year
- TQT's pytorch implementation.☆20Updated 2 years ago
- Implementing CNN code in CUDA and OpenCL to evaluate its performance on NVIDIA GPUs, AMD GPUs, and an FPGA platform.☆53Updated 7 years ago
- OpenCL Labs for PAPAA Summer School 2016 Edition☆46Updated 7 years ago
- ☆36Updated 5 years ago
- PyTorch implementation of DiracDeltaNet from paper Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs☆31Updated 5 years ago
- TVM learning and research☆12Updated 3 years ago
- BlockCIrculantRNN (LSTM and GRU) using TensorFlow☆14Updated 6 years ago
- My name is Fang Biao. I'm currently pursuing my Master degree with the college of Computer Science and Engineering, Si Chuan University, …☆41Updated last year
- FPGA-based neural network inference project for 2020 DAC System Design Contest☆110Updated 3 years ago
- A Out-of-box PyTorch Scaffold for Neural Network Quantization-Aware-Training (QAT) Research. Website: https://github.com/zhutmost/neuralz…☆26Updated last year
- Light-weighted neural network inference for object detection on small-scale FPGA board☆91Updated 5 years ago
- Neural Network Quantization & Low-Bit Fixed Point Training For Hardware-Friendly Algorithm Design☆157Updated 3 years ago
- ☆30Updated last year
- ☆23Updated 3 years ago
- The 1st place winner's source codes for DAC 2018 System Design Contest, FPGA Track☆88Updated 5 years ago
- Official implementation of "Searching for Winograd-aware Quantized Networks" (MLSys'20)☆27Updated last year
- Eyeriss chip simulator☆32Updated 4 years ago
- ☆35Updated 5 years ago
- Simulator for BitFusion☆90Updated 4 years ago
- ☆31Updated 5 years ago
- Reproduction of WAGE in PyTorch.☆41Updated 5 years ago
- ☆53Updated 5 years ago
- This is an open CNN accelerator for everyone to use☆14Updated 5 years ago
- Accelerating CNN's convolution operation on GPUs by using memory-efficient data access patterns.☆14Updated 6 years ago
- ☆14Updated 3 years ago
- ☆69Updated 4 years ago
- pytorch fixed point training tool/framework☆34Updated 4 years ago
- ☆45Updated 8 months ago
- ☆59Updated 2 months ago