benoitsteiner / tensorflow-xsmmLinks

Improved performance for TensorFlow on Intel hardware.

☆14

Alternatives and similar repositories for tensorflow-xsmm

Users that are interested in tensorflow-xsmm are comparing it to the libraries listed below

Sorting:

intel / mklnn
☆10Updated 2 years ago
ColfaxResearch / FALCON
Library for fast image convolution in neural networks on Intel Architecture
☆29Updated 7 years ago
flame / fmm-gen
Generating Families of Practical Fast Matrix Multiplication Algorithms
☆12Updated 7 years ago
naibaf7 / caffe
Caffe: a fast open framework for deep learning. With OpenCL and CUDA support.
☆86Updated 6 years ago
intel / MLSL
Intel(R) Machine Learning Scaling Library is a library providing an efficient implementation of communication patterns used in deep learn…
☆109Updated 2 years ago
Maratyszcza / FPplus
Scientific library for high-precision computations and research
☆49Updated 7 years ago
intel / isaac
Input-aware cuBLAS/clBLAS implementation for better performance
☆17Updated 2 years ago
moskewcz / boda
Boda: A C++ Framework for Efficient Experiments in Computer Vision
☆64Updated 5 years ago
HPAC / TTC
TTC: A high-performance Compiler for Tensor Transpositions
☆20Updated 7 years ago
opveclib / opveclib
The Operator Vectorization Library, or OVL, is a python productivity library for defining high performance custom operators for the Tenso…
☆68Updated 8 years ago
ppwwyyxx / haDNN
Proof-of-Concept CNN in Halide
☆22Updated 8 years ago
clementfarabet / neuflow
Compiler toolkit for neuFlow.
☆26Updated 11 years ago
ROCm / nnvm-rocm
NNVM for ROCm Examples
☆19Updated 7 years ago
linnanwang / BLASX
a heterogeneous multiGPU level-3 BLAS library
☆45Updated 5 years ago
jamolnng / OpenCL-CUDA-Tutorials
Sources for OpenCL and CUDA tutorials. http://jlaning.com
☆20Updated 9 years ago
NervanaSystems / caffe2neon
Tools to convert Caffe models to neon's serialization format
☆39Updated 2 years ago
mlcommons / training_results_v0.5
This repository contains the results and code for the MLPerf™ Training v0.5 benchmark.
☆35Updated 3 weeks ago
xingdi-eric-yuan / cuda-deep-neural-nets
Deep neural network framework (C/C++/CUDA).
☆31Updated 9 years ago
jnbntz / gpu-edu-workshops
Code examples for CUDA and OpenACC
☆34Updated 9 months ago
ARCambridge / MXNet_linalg_examples
Examples of building probabilistic models with MXNet linear algebra operators
☆23Updated 7 years ago
ravi-teja-mullapudi / Halide-NN
CNNs in Halide
☆23Updated 9 years ago
onnx / onnx-xla
XLA integration of Open Neural Network Exchange (ONNX)
☆19Updated 6 years ago
hyln9 / GCNGEMM
Optimized half precision gemm assembly kernels (deprecated due to ROCm)
☆47Updated 7 years ago
TimDettmers / clusterNet
Deep neural network framework for multiple GPUs
☆33Updated 9 years ago
shiyangdaisy23 / vqa-mxnet-gluon
☆16Updated 7 years ago
yuruofeifei / mms
MXNet Model Serving
☆25Updated 7 years ago
baidu-research / catamount
Catamount is a compute graph analysis tool to load, construct, and modify deep learning models and to symbolically analyze their compute …
☆13Updated 4 years ago
maps-gpu / MAPS
GPU Optimization and Memory Abstraction Framework
☆32Updated 5 years ago
milakov / nnForge
Convolutional neural networks C++ framework with CPU and GPU (CUDA) backends
☆178Updated 6 years ago
NVlabs / sassifi
An Architecture-level Fault Injection Tool for GPU Application Resilience Evaluations
☆17Updated 5 years ago