google / gemmlowpLinks

Low-precision matrix multiplication

☆1,819

Alternatives and similar repositories for gemmlowp

Users that are interested in gemmlowp are comparing it to the libraries listed below

Sorting:

Maratyszcza / NNPACK
Acceleration package for neural networks on multi-core CPUs
☆1,702Updated last year
pytorch / QNNPACK
Quantized Neural Network PACKage - mobile-optimized implementation of quantized neural network operators
☆1,547Updated 6 years ago
andravin / wincnn
Winograd minimal convolution algorithm generator for convolutional neural networks.
☆624Updated 5 years ago
pytorch / FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
☆1,493Updated this week
baidu-research / DeepBench
Benchmarking Deep Learning operations on different hardware
☆1,102Updated 4 years ago
NervanaSystems / ngraph
nGraph has moved to OpenVINO
☆1,345Updated 5 years ago
flame / how-to-optimize-gemm
☆1,960Updated 2 years ago
dmlc / nnvm
☆1,655Updated 7 years ago
ARM-software / ComputeLibrary
The Compute Library is a set of computer vision and machine learning functions optimised for both Arm CPUs and GPUs using SIMD technologi…
☆3,073Updated 2 weeks ago
ARM-software / armnn
Arm NN ML Software.
☆1,290Updated last week
uxlfoundation / oneDNN
oneAPI Deep Neural Network Library (oneDNN)
☆3,933Updated this week
google / XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
☆2,187Updated this week
PaddlePaddle / Anakin
High performance Cross-platform Inference-engine, you could run Anakin on x86-cpu,arm, nv-gpu, amd-gpu,bitmain and cambricon devices.
☆535Updated 3 years ago
facebookresearch / TensorComprehensions
A domain specific language to express machine learning workloads.
☆1,759Updated 2 years ago
microsoft / nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
☆1,001Updated last year
NVIDIA-developer-blog / code-samples
Source code examples from the Parallel Forall Blog
☆1,313Updated 2 months ago
tensorflow / runtime
A performant and modular runtime for TensorFlow
☆757Updated 3 months ago
NervanaSystems / maxas
Assembler for NVIDIA Maxwell architecture
☆1,054Updated 2 years ago
jiazhihao / TASO
The Tensor Algebra SuperOptimizer for Deep Learning
☆731Updated 2 years ago
pytorch / glow
Compiler for Neural Network hardware accelerators
☆3,323Updated last year
zdevito / ATen
ATen: A TENsor library for C++11
☆710Updated 6 years ago
allenai / XNOR-Net
ImageNet classification using binary Convolutional Neural Networks
☆868Updated 8 years ago
intel / clDNN
Compute Library for Deep Neural Networks (clDNN)
☆575Updated 2 years ago
dmlc / mshadow
Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning
☆1,119Updated 6 years ago
huawei-noah / bolt
Bolt is a deep learning library with high performance and heterogeneous flexibility.
☆954Updated 7 months ago
tensorflow / mlir
"Multi-Level Intermediate Representation" Compiler Infrastructure
☆1,759Updated 4 years ago
Tencent / FeatherCNN
FeatherCNN is a high performance inference engine for convolutional neural networks.
☆1,223Updated 6 years ago
tensor-compiler / taco
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
☆1,336Updated 7 months ago
NVIDIA / cub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
☆1,808Updated 2 years ago
libxsmm / libxsmm
Library for specialized dense and sparse matrix operations, and deep learning primitives.
☆923Updated this week