zdevito / ATenLinks

ATen: A TENsor library for C++11

☆703

Alternatives and similar repositories for ATen

Users that are interested in ATen are comparing it to the libraries listed below

Sorting:

tbennun / cudnn-training
A CUDNN minimal deep learning training code sample using LeNet.
☆268Updated 2 years ago
pytorch / gloo
Collective communications library with various primitives for multi-machine training.
☆1,332Updated this week
dmlc / dlpack
common in-memory tensor structure
☆1,042Updated last month
pytorch / tvm
TVM integration into PyTorch
☆453Updated 5 years ago
openai / blocksparse
Efficient GPU kernels for block-sparse matrix multiplication and convolution
☆1,044Updated 2 years ago
pytorch / extension-cpp
C++ extensions in PyTorch
☆1,124Updated 3 weeks ago
facebookresearch / TensorComprehensions
A domain specific language to express machine learning workloads.
☆1,760Updated 2 years ago
NVIDIA / cnmem
A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory
☆298Updated 6 years ago
openai / openai-gemm
Open single and half precision gemm implementations
☆382Updated 2 years ago
dmlc / mshadow
Matrix Shadow:Lightweight CPU/GPU Matrix and Tensor Template Library in C++/CUDA for (Deep) Machine Learning
☆1,118Updated 5 years ago
chrischoy / pytorch-custom-cuda-tutorial
Tutorial for building a custom CUDA function for Pytorch
☆519Updated 6 years ago
tensor-compiler / taco
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
☆1,314Updated 3 months ago
warmspringwinds / pytorch-cpp
Pytorch C++ Library
☆367Updated 7 years ago
baidu-research / baidu-allreduce
☆588Updated 7 years ago
pytorch / extension-ffi
Examples of C extensions for PyTorch
☆256Updated 2 years ago
mila-iqia / myia
Myia prototyping
☆457Updated 2 years ago
dlsys-course / assignment1-2018
Assignment 1: automatic differentiation
☆474Updated 6 years ago
Maratyszcza / NNPACK
Acceleration package for neural networks on multi-core CPUs
☆1,692Updated last year
tensorflow / custom-op
Guide for building custom op for TensorFlow
☆382Updated 2 years ago
jiazhihao / TASO
The Tensor Algebra SuperOptimizer for Deep Learning
☆726Updated 2 years ago
pytorch / cppdocs
PyTorch C++ API Documentation
☆232Updated this week
dmlc / mxnet-memonger
Sublinear memory optimization for deep learning, reduce GPU memory cost to train deeper nets
☆308Updated 7 years ago
NVIDIA / cub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
☆1,765Updated last year
dlsys-course / dlsys-course.github.io
Deep learning system course
☆214Updated 6 years ago
NVIDIA-developer-blog / code-samples
Source code examples from the Parallel Forall Blog
☆1,300Updated last year
google / gemmlowp
Low-precision matrix multiplication
☆1,812Updated last year
intel / ideep
Intel® Optimization for Chainer*, a Chainer module providing numpy like API and DNN acceleration using MKL-DNN.
☆173Updated this week
pytorch / FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
☆1,415Updated this week
baidu-research / DeepBench
Benchmarking Deep Learning operations on different hardware
☆1,094Updated 4 years ago
dmlc / nnvm
☆1,658Updated 6 years ago