ravi-teja-mullapudi / Halide-NNLinks

CNNs in Halide

☆23

Alternatives and similar repositories for Halide-NN

Users that are interested in Halide-NN are comparing it to the libraries listed below

Sorting:

ppwwyyxx / haDNN
Proof-of-Concept CNN in Halide
☆22Updated 9 years ago
hyln9 / GCNGEMM
Optimized half precision gemm assembly kernels (deprecated due to ROCm)
☆47Updated 8 years ago
jrk / gradient-halide
☆102Updated 5 years ago
ColfaxResearch / FALCON
Library for fast image convolution in neural networks on Intel Architecture
☆31Updated 8 years ago
5antelope / H-Pipe
☆9Updated 8 years ago
dmlc / HalideIR
Symbolic Expression and Statement Module for new DSLs
☆205Updated 4 years ago
masahi / tvm-winograd
Test winograd convolution written in TVM for CUDA and AMDGPU
☆41Updated 6 years ago
linnanwang / BLASX
a heterogeneous multiGPU level-3 BLAS library
☆45Updated 5 years ago
naibaf7 / libdnn
Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL
☆136Updated 8 years ago
XiuYuLi / flexible-gemm
flexible-gemm conv of deepcore
☆17Updated 5 years ago
dmlc / nnvm-fusion
Kernel Fusion and Runtime Compilation Based on NNVM
☆70Updated 8 years ago
csehydrogen / Winograd-OpenCL
Winograd-based convolution implementation in OpenCL
☆28Updated 8 years ago
strin / gemm-android
tutorial to optimize GEMM performance on android
☆51Updated 9 years ago
bwasti / pytorch_compiler_tutorial
Codebase associated with the PyTorch compiler tutorial
☆46Updated 5 years ago
hipacc / hipacc
A domain-specific language and compiler for image processing
☆76Updated 4 years ago
andersy005 / tvm-in-action
TVM stack: exploring the incredible explosion of deep-learning frameworks and how to bring them together
☆64Updated 7 years ago
chasingegg / Winconv
implementation of winograd minimal convolution algorithm on Intel Architecture
☆39Updated 7 years ago
gplhegde / caffepresso
CaffePresso: An Optimized Library for Deep Learning on Embedded Accelerator-based platforms
☆87Updated 9 months ago
xingyul / sparse-winograd-cnn
Efficient Sparse-Winograd Convolutional Neural Networks (ICLR 2018)
☆191Updated 6 years ago
arbenson / fast-matmul
Fast matrix multiplication
☆29Updated 4 years ago
CSshengxy / MEC
ICML2017 MEC: Memory-efficient Convolution for Deep Neural Network C++实现(非官方)
☆17Updated 6 years ago
spcl / ucudnn
Accelerating DNN Convolutional Layers with Micro-batches
☆63Updated 5 years ago
zhaoweicai / hwgq
Caffe implementation of accurate low-precision neural networks
☆117Updated 6 years ago
CNugteren / CLTune
CLTune: An automatic OpenCL & CUDA kernel tuner
☆180Updated 2 years ago
codeplaysoftware / portDNN
portDNN is a library implementing neural network algorithms written using SYCL
☆113Updated last year
vinx13 / tvm-cuda-int8-benchmark
Benchmark of TVM quantized model on CUDA
☆111Updated 5 years ago
tbennun / cudnn-training
A CUDNN minimal deep learning training code sample using LeNet.
☆268Updated 2 years ago
CNugteren / CLCudaAPI
A portable high-level API with CUDA or OpenCL back-end
☆54Updated 7 years ago
bondhugula / polymage-benchmarks
Base code and optimized code for the benchmarks used in the PolyMage paper published at ASPLOS 2015
☆19Updated 9 years ago
halide / CVPR2015
Example code used in the CVPR 2015 tutorial
☆41Updated 9 years ago