henline / streamexecutordocLinks

Documentation for StreamExecutor open source proposal

☆83

Alternatives and similar repositories for streamexecutordoc

Users that are interested in streamexecutordoc are comparing it to the libraries listed below

Sorting:

cuihenggang / geeps
GPU-specialized parameter server for GPU machine learning.
☆101Updated 7 years ago
dmlc / nnvm-fusion
Kernel Fusion and Runtime Compilation Based on NNVM
☆70Updated 8 years ago
dmlc / HalideIR
Symbolic Expression and Statement Module for new DSLs
☆205Updated 4 years ago
NVIDIA / cnmem
A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory
☆298Updated 6 years ago
baidu-research / tensorflow-allreduce
☆371Updated 7 years ago
baidu-research / baidu-allreduce
☆588Updated 7 years ago
tvmai / meetup-slides
Place for meetup slides
☆141Updated 4 years ago
dlsys-course / tinyflow
Tutorial code on how to build your own Deep Learning System in 2k Lines
☆125Updated 8 years ago
tensorflow / networking
Enhanced networking support for TensorFlow. Maintained by SIG-networking.
☆98Updated 3 years ago
sailing-pmls / pmls-caffe
PMLS-Caffe: Distributed Deep Learning Framework for Parallel ML System
☆194Updated 7 years ago
openai / openai-gemm
Open single and half precision gemm implementations
☆382Updated 2 years ago
linnanwang / superneurons-release
this is the release repository of superneurons
☆52Updated 4 years ago
hclhkbu / dlbench
Benchmarking State-of-the-Art Deep Learning Software Tools
☆170Updated 7 years ago
Funatiq / gossip
gossip: Efficient Communication Primitives for Multi-GPU Systems
☆59Updated 3 years ago
jiazhihao / metaflow_sysml19
Repository for SysML19 Artifacts Evaluation
☆54Updated 6 years ago
dlsys-course / dlsys-course.github.io
Deep learning system course
☆214Updated 6 years ago
ezyang / nvprof2json
Convert nvprof profiles into about:tracing compatible JSON files
☆70Updated 4 years ago
tobegit3hub / tftvm
TensorFlow and TVM integration
☆37Updated 5 years ago
Caffe-MPI / Caffe-MPI.github.io
☆125Updated 7 years ago
mlcommons / training_results_v0.5
This repository contains the results and code for the MLPerf™ Training v0.5 benchmark.
☆35Updated 2 months ago
tbennun / cudnn-training
A CUDNN minimal deep learning training code sample using LeNet.
☆268Updated 2 years ago
tqchen / ffi-navigator
☆241Updated this week
NVIDIA / nvtx-plugins
Python bindings for NVTX
☆66Updated 2 years ago
eBay / maxDNN
High Efficiency Convolution Kernel for Maxwell GPU Architecture
☆134Updated 8 years ago
xieyu / read-tf
notes on reading tensorflow source code
☆13Updated 6 years ago
bytedance / ps-lite
A lightweight parameter server interface
☆77Updated 2 years ago
tensorflow / mlir-hlo
☆420Updated this week
dmlc / rabit
Reliable Allreduce and Broadcast Interface for distributed machine learning
☆514Updated 4 years ago
ikhlestov / tensorflow_profiling
Scripts with example usage of tensorflow profiler
☆83Updated 8 years ago
anilshanbhag / gpu-topk
Efficient Top-K implementation on the GPU
☆183Updated 6 years ago