Cambricon / catch
☆30Updated 2 years ago
Alternatives and similar repositories for catch:
Users that are interested in catch are comparing it to the libraries listed below
- Development repository for the Triton-Linalg conversion☆185Updated 2 months ago
- examples for tvm schedule API☆101Updated last year
- Yinghan's Code Sample☆323Updated 2 years ago
- code reading for tvm☆76Updated 3 years ago
- Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .☆115Updated 2 weeks ago
- ☆138Updated 4 months ago
- This fork of BVLC/Caffe is dedicated to supporting Cambricon deep learning processor and improving performance of this deep learning fram…☆41Updated 4 years ago
- [MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration☆197Updated 2 years ago
- ☆115Updated 4 months ago
- ☆88Updated 3 weeks ago
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆339Updated 3 months ago
- ☆148Updated 3 months ago
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆393Updated 7 months ago
- A simple high performance CUDA GEMM implementation.☆361Updated last year
- ☆192Updated 2 years ago
- ☆61Updated 3 months ago
- ☆38Updated 3 years ago
- Automatic Schedule Exploration and Optimization Framework for Tensor Computations☆176Updated 3 years ago
- ☆109Updated last year
- ☆122Updated last year
- A Easy-to-understand TensorOp Matmul Tutorial☆342Updated 7 months ago
- Examples of CUDA implementations by Cutlass CuTe☆159Updated 2 months ago
- A benchmark suited especially for deep learning operators☆42Updated 2 years ago
- ☆102Updated last month
- A baseline repository of Auto-Parallelism in Training Neural Networks☆144Updated 2 years ago
- ☆96Updated 3 years ago
- ☆25Updated last year
- A home for the final text of all TVM RFCs.☆102Updated 7 months ago
- Fast CUDA Kernels for ResNet Inference.☆173Updated 5 years ago
- ☆139Updated last year