intel / light-model-transformerLinks

☆72

Alternatives and similar repositories for light-model-transformer

Users that are interested in light-model-transformer are comparing it to the libraries listed below

Sorting:

spcl / ucudnn
Accelerating DNN Convolutional Layers with Micro-batches
☆63Updated 5 years ago
intel / ideep
Intel® Optimization for Chainer*, a Chainer module providing numpy like API and DNN acceleration using MKL-DNN.
☆171Updated last month
naibaf7 / libdnn
Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL
☆137Updated 8 years ago
hma02 / cublasHgemm-P100
Code for testing the native float16 matrix multiplication performance on Tesla P100 and V100 GPU based on cublasHgemm
☆35Updated 6 years ago
ColfaxResearch / FALCON
Library for fast image convolution in neural networks on Intel Architecture
☆31Updated 8 years ago
eBay / maxDNN
High Efficiency Convolution Kernel for Maxwell GPU Architecture
☆137Updated 8 years ago
strin / gemm-android
tutorial to optimize GEMM performance on android
☆51Updated 9 years ago
NVIDIA / cnmem
A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory
☆299Updated 6 years ago
Maratyszcza / FP16
Conversion to/from half-precision floating point formats
☆374Updated 3 months ago
NVIDIA / kmeans
kmeans clustering with multi-GPU capabilities
☆119Updated 2 years ago
hyln9 / GCNGEMM
Optimized half precision gemm assembly kernels (deprecated due to ROCm)
☆47Updated 8 years ago
intel / clGPU
☆68Updated 3 years ago
DensoITLab / Demitasse
Demitasse: SPMD Programing Implementation of Deep Neural Network Library for Mobile Devices(NeurIPS2016WS)
☆23Updated 8 years ago
merrymercy / tvm-mali
Optimizing Mobile Deep Learning on ARM GPU with TVM
☆181Updated 7 years ago
ravi-teja-mullapudi / Halide-NN
CNNs in Halide
☆23Updated 10 years ago
tbennun / cudnn-training
A CUDNN minimal deep learning training code sample using LeNet.
☆269Updated 2 years ago
XiaoMi / nnlib
Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib
☆58Updated 2 years ago
dmlc / HalideIR
Symbolic Expression and Statement Module for new DSLs
☆205Updated 5 years ago
NVlabs / cub
THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.
☆85Updated last year
okdshin / instant
DNN Inference with CPU, C++, ONNX support: Instant
☆56Updated 7 years ago
anilshanbhag / gpu-topk
Efficient Top-K implementation on the GPU
☆188Updated 6 years ago
lcskrishna / onnx-parser
ONNX Parser is a tool that automatically generates openvx inference code (CNN) from onnx binary model files.
☆18Updated 6 years ago
XiuYuLi / flexible-gemm
flexible-gemm conv of deepcore
☆17Updated 5 years ago
lukeyeager / cmake-cuda-example
Example of how to use CUDA with CMake >= 3.8
☆70Updated 5 months ago
carlushuang / cpu_gemm_opt
how to design cpu gemm on x86 with avx256, that can beat openblas.
☆73Updated 6 years ago
dmlc / nnvm-fusion
Kernel Fusion and Runtime Compilation Based on NNVM
☆72Updated 9 years ago
codeplaysoftware / portDNN
portDNN is a library implementing neural network algorithms written using SYCL
☆113Updated last year
henline / streamexecutordoc
Documentation for StreamExecutor open source proposal
☆83Updated 9 years ago
CNugteren / myGEMM
Code appendix to an OpenCL matrix-multiplication tutorial
☆178Updated 8 years ago
pfnet-research / allreduce-proto
A prototype implementation of AllReduce collective communication routine.
☆19Updated 7 years ago