strin / gemm-androidLinks

tutorial to optimize GEMM performance on android

☆51

Alternatives and similar repositories for gemm-android

Users that are interested in gemm-android are comparing it to the libraries listed below

Sorting:

naibaf7 / libdnn
Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL
☆136Updated 8 years ago
merrymercy / tvm-mali
Optimizing Mobile Deep Learning on ARM GPU with TVM
☆181Updated 6 years ago
OAID / MXNet-HRT
Heterogeneous Run Time version of MXNet. Added heterogeneous capabilities to the MXNet, uses heterogeneous computing infrastructure frame…
☆72Updated 7 years ago
ajtulloch / caffe
Caffe: a fast open framework for deep learning.
☆14Updated 9 years ago
Maratyszcza / caffe-nnpack
Caffe with NNPACK integration
☆58Updated 9 years ago
hyln9 / GCNGEMM
Optimized half precision gemm assembly kernels (deprecated due to ROCm)
☆47Updated 8 years ago
PerfXLab / embedded_ai
☆209Updated 7 years ago
ColfaxResearch / FALCON
Library for fast image convolution in neural networks on Intel Architecture
☆31Updated 8 years ago
IntelLabs / SkimCaffe
Caffe for Sparse Convolutional Neural Network
☆238Updated 2 years ago
strin / mocha-gemm-profile
profiling gemm on android
☆10Updated 9 years ago
XiaoMi / nnlib
Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib
☆58Updated 2 years ago
ctuning / ck-tensorrt
Collective Knowledge repository for NVIDIA's TensorRT
☆37Updated 4 years ago
Orion34-lanbo / tvm-batch-matmul-example
☆24Updated 7 years ago
MichalBusta / caffe
Ristretto: Caffe-based approximation of convolutional neural networks.
☆30Updated 6 years ago
MatthieuCourbariaux / deep-learning-multipliers
Training deep neural networks with low precision multiplications
☆63Updated 10 years ago
OAID / Caffe-HRT
Heterogeneous Run Time version of Caffe. Added heterogeneous capabilities to the Caffe, uses heterogeneous computing infrastructure frame…
☆269Updated 6 years ago
eBay / maxDNN
High Efficiency Convolution Kernel for Maxwell GPU Architecture
☆134Updated 8 years ago
XiuYuLi / flexible-gemm
flexible-gemm conv of deepcore
☆17Updated 5 years ago
masahi / nnvm-vision-demo
Demos interesting image-in, image-out networks running on both NVIDIA and AMD GPUs, with NNVM
☆49Updated 7 years ago
ppwwyyxx / haDNN
Proof-of-Concept CNN in Halide
☆22Updated 9 years ago
CAS-CLab / CNN-Inference-Engine-Quick-View
A quick view of high-performance convolution neural networks (CNNs) inference engines on mobile devices.
☆150Updated 3 years ago
pmgysel / caffe
Ristretto: Caffe-based approximation of convolutional neural networks.
☆291Updated 6 years ago
zhaoweicai / hwgq
Caffe implementation of accurate low-precision neural networks
☆117Updated 6 years ago
mtmd / Mobile_ConvNet
RenderScript based implementation of Convolutional Neural Networks for Android phones
☆52Updated 7 years ago
dmlc / MXNet.cpp
C++ interface for mxnet
☆115Updated 8 years ago
NVIDIA / cnmem
A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory
☆298Updated 6 years ago
CAS-CLab / quantized-cnn
An efficient framework for convolutional neural networks
☆277Updated last year
zhxfl / CUDA-CNN
CNN accelerated by cuda. Test on mnist and finilly get 99.76%
☆186Updated 7 years ago
dmlc / web-data
The repo to host all the web data including images for documents in dmlc projects.
☆84Updated 3 years ago
vinx13 / tvm-cuda-int8-benchmark
Benchmark of TVM quantized model on CUDA
☆111Updated 5 years ago