kpu / intgemmLinks

int8_t and int16_t matrix multiply based on https://arxiv.org/abs/1705.01991

☆74

Alternatives and similar repositories for intgemm

Users that are interested in intgemm are comparing it to the libraries listed below

Sorting:

marian-nmt / amun
Fast stand-alone C++ decoder for RNN-based NMT models
☆29Updated 4 years ago
google / ruy
☆316Updated 4 months ago
YulhwaKim / cutlass_tilesparse
CUDA templates for tile-sparse matrix multiplication based on CUTLASS.
☆50Updated 7 years ago
DeMoriarty / custom_matmul_kernels
Customized matrix multiplication kernels
☆57Updated 3 years ago
marsupialtail / sparsednn
Fast sparse deep learning on CPUs
☆56Updated 3 years ago
marian-nmt / marian-dev
Fast Neural Machine Translation in C++ - development repository
☆282Updated 4 months ago
maltanar / gemmbitserial
Fast matrix multiplication for few-bit integer matrices on CPUs.
☆28Updated 6 years ago
carlushuang / cpu_gemm_opt
how to design cpu gemm on x86 with avx256, that can beat openblas.
☆73Updated 6 years ago
csukuangfj / OpenCNN
An Open Convolutional Neural Network Framework in C++ From Scratch
☆66Updated 4 years ago
astojanov / Clover
Clover: Quantized 4-bit Linear Algebra Library
☆114Updated 7 years ago
bwasti / pytorch_compiler_tutorial
Codebase associated with the PyTorch compiler tutorial
☆47Updated 6 years ago
XapaJIaMnu / gLM
A GPU language model, based on btree backed tries.
☆30Updated 7 years ago
Harry-Chen / InfMoE
Inference framework for MoE layers based on TensorRT with Python binding
☆41Updated 4 years ago
mlcommons / training_results_v0.7
This repository contains the results and code for the MLPerf™ Training v0.7 benchmark.
☆57Updated 2 years ago
VoVAllen / tf-dlpack
DLPack for Tensorflow
☆35Updated 5 years ago
meta-pytorch / multipy
torch::deploy (multipy for non-torch uses) is a system that lets you get around the GIL problem by running multiple Python interpreters i…
☆182Updated 2 months ago
masahi / torchscript-to-tvm
☆68Updated 2 years ago
google-research / sputnik
A library of GPU kernels for sparse matrix operations.
☆275Updated 4 years ago
adityaiitb / PyProf
A GPU performance profiling tool for PyTorch models
☆22Updated 3 years ago
parasj / checkmate
Training neural networks in TensorFlow 2.0 with 5x less memory
☆137Updated 3 years ago
mcarilli / mixed_precision_references
Personal collection of references for high performance mixed precision training.
☆41Updated 6 years ago
NVIDIA / online-softmax
Benchmark code for the "Online normalizer calculation for softmax" paper
☆102Updated 7 years ago
microsoft / infinibatch
Efficient, check-pointed data loading for deep learning with massive data sets.
☆209Updated 2 years ago
dmlc / nnvm-fusion
Kernel Fusion and Runtime Compilation Based on NNVM
☆72Updated 9 years ago
Maratyszcza / FP16
Conversion to/from half-precision floating point formats
☆374Updated 3 months ago
intel / ideep
Intel® Optimization for Chainer*, a Chainer module providing numpy like API and DNN acceleration using MKL-DNN.
☆171Updated this week
pytorch / rfcs
PyTorch RFCs (experimental)
☆136Updated 5 months ago
larq / compute-engine
Highly optimized inference engine for Binarized Neural Networks
☆251Updated 2 weeks ago
OpenNMT / Tokenizer
Fast and customizable text tokenization library with BPE and SentencePiece support
☆322Updated 7 months ago
scailable / sclblonnx
Scailable ONNX python tools
☆97Updated last year