brandontrabucco / lstm-cudaLinks

This is a c++ implementation of an LSTM Neural Network parallelized for a GPU using CUDA

☆25

Alternatives and similar repositories for lstm-cuda

Users that are interested in lstm-cuda are comparing it to the libraries listed below

Sorting:

Hardware-Alchemy / cuDNN-sample
cuDNN sample codes provided by Nvidia
☆46Updated 6 years ago
carlushuang / cpu_gemm_opt
how to design cpu gemm on x86 with avx256, that can beat openblas.
☆70Updated 6 years ago
dmlc / nnvm-fusion
Kernel Fusion and Runtime Compilation Based on NNVM
☆70Updated 8 years ago
anilshanbhag / gpu-topk
Efficient Top-K implementation on the GPU
☆182Updated 6 years ago
NVIDIA / online-softmax
Benchmark code for the "Online normalizer calculation for softmax" paper
☆95Updated 6 years ago
xuqiantong / CUDA-Winograd
Fast CUDA Kernels for ResNet Inference.
☆177Updated 6 years ago
YulhwaKim / cutlass_tilesparse
CUDA templates for tile-sparse matrix multiplication based on CUTLASS.
☆51Updated 7 years ago
yuxianzhi / Top-K
A way to use cuda to accelerate top k algorithm
☆29Updated 8 years ago
NVlabs / cub
THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
☆84Updated last year
zhxfl / CUDA-CNN
CNN accelerated by cuda. Test on mnist and finilly get 99.76%
☆186Updated 7 years ago
XiuYuLi / flexible-gemm
flexible-gemm conv of deepcore
☆17Updated 5 years ago
dumerrill / merge-spmv
☆93Updated 8 years ago
wzsh / wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
☆138Updated 4 years ago
XiuYuLi / deepcore_source_code
Subpart source code of of deepcore v0.7
☆27Updated 5 years ago
masahi / tvm-winograd
Test winograd convolution written in TVM for CUDA and AMDGPU
☆41Updated 6 years ago
alibaba / heterogeneity-aware-lowering-and-optimization
heterogeneity-aware-lowering-and-optimization
☆255Updated last year
ekondis / gpumembench
A GPU benchmark suite for assessing on-chip GPU memory bandwidth
☆106Updated 7 years ago
NeuHub / TVMDeepDive
☆22Updated 5 years ago
maltanar / gemmbitserial
Fast matrix multiplication for few-bit integer matrices on CPUs.
☆28Updated 6 years ago
mlcommons / inference_results_v0.7
This repository contains the results and code for the MLPerf™ Inference v0.7 benchmark.
☆17Updated 3 weeks ago
xieyu / read-tf
notes on reading tensorflow source code
☆13Updated 6 years ago
chasingegg / Winconv
implementation of winograd minimal convolution algorithm on Intel Architecture
☆39Updated 7 years ago
tlc-pack / tophub
tophub autotvm log collections
☆70Updated 2 years ago
hclhkbu / gcoospdm
Sparse-dense matrix-matrix multiplication on GPUs
☆14Updated 6 years ago
mrcat2018 / AutodiffEngine
AutodiffEngine
☆13Updated 6 years ago
LeiWang1999 / tvm_gpu_gemm
play gemm with tvm
☆91Updated 2 years ago
marsupialtail / sparsednn
Fast sparse deep learning on CPUs
☆54Updated 2 years ago
OpenHero / im2col
image to column
☆30Updated 11 years ago
daadaada / turingas
Assembler for NVIDIA Volta and Turing GPUs
☆224Updated 3 years ago
linnanwang / superneurons-release
this is the release repository of superneurons
☆52Updated 4 years ago