Cambricon / catch
☆29Updated last year
Alternatives and similar repositories for catch:
Users that are interested in catch are comparing it to the libraries listed below
- Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .☆108Updated this week
- Yinghan's Code Sample☆305Updated 2 years ago
- Development repository for the Triton-Linalg conversion☆173Updated 2 weeks ago
- examples for tvm schedule API☆99Updated last year
- code reading for tvm☆74Updated 3 years ago
- ☆98Updated 2 months ago
- ☆129Updated last month
- ☆142Updated last month
- A simple high performance CUDA GEMM implementation.☆347Updated last year
- ☆80Updated last year
- ☆36Updated 2 years ago
- ☆35Updated 4 months ago
- ☆140Updated 9 months ago
- ☆109Updated 10 months ago
- This fork of BVLC/Caffe is dedicated to supporting Cambricon deep learning processor and improving performance of this deep learning fram…☆41Updated 4 years ago
- ☆110Updated 11 months ago
- [MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration☆197Updated 2 years ago
- ☆58Updated last month
- ☆195Updated last year
- ☆26Updated 10 months ago
- A Easy-to-understand TensorOp Matmul Tutorial☆316Updated 5 months ago
- A baseline repository of Auto-Parallelism in Training Neural Networks☆142Updated 2 years ago
- Shared Middle-Layer for Triton Compilation☆226Updated this week
- Optimizing SGEMM kernel functions on NVIDIA GPUs to a close-to-cuBLAS performance.☆324Updated last month
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆345Updated 5 months ago
- ☆95Updated 3 years ago
- A home for the final text of all TVM RFCs.☆102Updated 4 months ago
- A benchmark suited especially for deep learning operators☆42Updated 2 years ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆79Updated last year
- heterogeneity-aware-lowering-and-optimization☆254Updated last year