brandontrabucco / lstm-cudaLinks
This is a c++ implementation of an LSTM Neural Network parallelized for a GPU using CUDA
☆24Updated 7 years ago
Alternatives and similar repositories for lstm-cuda
Users that are interested in lstm-cuda are comparing it to the libraries listed below
Sorting:
- cuDNN sample codes provided by Nvidia☆46Updated 6 years ago
- Efficient Top-K implementation on the GPU☆186Updated 6 years ago
- ☆22Updated 5 years ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆72Updated 6 years ago
- Subpart source code of of deepcore v0.7☆27Updated 5 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆71Updated 8 years ago
- ☆93Updated 8 years ago
- Fast CUDA Kernels for ResNet Inference.☆180Updated 6 years ago
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆143Updated 5 years ago
- CUDA templates for tile-sparse matrix multiplication based on CUTLASS.☆50Updated 7 years ago
- flexible-gemm conv of deepcore☆17Updated 5 years ago
- CUDA by practice☆130Updated 5 years ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆101Updated 7 years ago
- ☆115Updated last year
- Optimized half precision gemm assembly kernels (deprecated due to ROCm)☆47Updated 8 years ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- notes on reading tensorflow source code☆13Updated 7 years ago
- High optimized fft library based on CUDA(the same fast as cufft and faster some times)☆19Updated 8 years ago
- CUDA official sample codes☆372Updated 9 years ago
- play gemm with tvm☆91Updated 2 years ago
- Implementation of breadth first search on GPU with CUDA Driver API.☆52Updated 4 years ago
- CNN accelerated by cuda. Test on mnist and finilly get 99.76%☆185Updated 7 years ago
- implementation of winograd minimal convolution algorithm on Intel Architecture☆39Updated 7 years ago
- A GPU benchmark suite for assessing on-chip GPU memory bandwidth☆106Updated 8 years ago
- GPU implementation of Winograd convolution☆10Updated 7 years ago
- An extension library of WMMA API (Tensor Core API)☆106Updated last year
- kmeans clustering with multi-GPU capabilities☆119Updated 2 years ago
- ☆68Updated 11 years ago
- A CUDNN minimal deep learning training code sample using LeNet.☆268Updated 2 years ago
- study of Ampere' Sparse Matmul☆18Updated 4 years ago