pfnet-research / allreduce-proto
A prototype implementation of AllReduce collective communication routine.
☆20Updated 5 years ago
Related projects: ⓘ
- Accelerating DNN Convolutional Layers with Micro-batches☆64Updated 4 years ago
- Test winograd convolution written in TVM for CUDA and AMDGPU☆39Updated 5 years ago
- DNN Inference with CPU, C++, ONNX support: Instant☆56Updated 5 years ago
- Experimental toolchain to compile and run Chainer models☆112Updated 4 years ago
- A distributed shared memory library for high-performance computing☆10Updated 3 years ago
- Add-on package for ONNX format support in Chainer☆85Updated 4 years ago
- Sparse matrix computation library for GPU☆54Updated 4 years ago
- NNVM for ROCm Examples☆19Updated 6 years ago
- Kernel Fusion and Runtime Compilation Based on NNVM☆69Updated 7 years ago
- experimental binary net implementation in chainer☆101Updated 8 years ago
- ONNX Parser is a tool that automatically generates openvx inference code (CNN) from onnx binary model files.☆17Updated 5 years ago
- Codebase associated with the PyTorch compiler tutorial☆44Updated 5 years ago
- Artifacts for SOSP'19 paper Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions☆21Updated 2 years ago
- Intel® Optimization for Chainer*, a Chainer module providing numpy like API and DNN acceleration using MKL-DNN.☆162Updated last week
- nGraph™ Backend for ONNX☆42Updated last year
- Python Binding to NVRTC☆79Updated 6 years ago
- Library for fast image convolution in neural networks on Intel Architecture☆29Updated 7 years ago
- RDMA Optimization on MXNet☆14Updated 6 years ago
- Fast binary matrix product on CPU☆10Updated 8 years ago
- Generating Families of Practical Fast Matrix Multiplication Algorithms☆12Updated 7 years ago
- Estimate theoretical computational cost of a chainer-based neural network☆50Updated 4 years ago
- An analytical performance modeling tool for deep neural networks.☆85Updated 3 years ago
- CNNs in Halide☆22Updated 8 years ago
- PyProf2: PyTorch Profiling tool☆83Updated 4 years ago
- ☆71Updated this week
- Training deep neural networks with low precision multiplications☆63Updated 9 years ago
- Chainer x TensorRT☆34Updated 5 years ago
- Greentea LibDNN - a universal convolution implementation supporting CUDA and OpenCL☆135Updated 7 years ago
- An experimental ahead of time compiler for Relay.☆51Updated 4 years ago