feifeibear / swDNNLinks
a highly-efficient library for deep neural networks based on Sunway TaihuLight supercomputer.
☆17Updated 6 years ago
Alternatives and similar repositories for swDNN
Users that are interested in swDNN are comparing it to the libraries listed below
Sorting:
- A highly efficient library for GEMM operations on Sunway TaihuLight☆17Updated 4 years ago
- CUDA PTX-ISA Document 中文翻译版☆42Updated last month
- ☆23Updated 4 years ago
- examples for tvm schedule API☆101Updated 2 years ago
- ☆23Updated 3 years ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆83Updated 2 years ago
- Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .☆123Updated last week
- ☆79Updated 2 years ago
- BytePS examples (Vision, NLP, GAN, etc)☆19Updated 2 years ago
- ☆36Updated 5 months ago
- REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…☆94Updated 2 years ago
- ☆65Updated 5 months ago
- ☆113Updated last year
- ☆26Updated 4 months ago
- play gemm with tvm☆91Updated last year
- ☆30Updated last year
- this is the release repository of superneurons☆52Updated 4 years ago
- ☆73Updated 2 months ago
- Subpart source code of of deepcore v0.7☆27Updated 5 years ago
- ☆28Updated last year
- Triton Compiler related materials.☆30Updated 5 months ago
- a tensor computing compiler based tile programming for gpu, cpu or tpu☆43Updated this week
- ☆146Updated 6 months ago
- ☆14Updated 3 years ago
- How to optimize sgemm in single-thread ARM cpu, mutli-threads ARM cpu and Nvidia gpu☆23Updated 3 years ago
- ☆82Updated last week
- Personal Notes for Learning HPC & Parallel Computation [Active Adding New Content]☆67Updated 2 years ago
- ☆148Updated 5 months ago
- This is a tuned sparse matrix dense vector multiplication(SpMV) library☆21Updated 9 years ago
- Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA☆32Updated 4 years ago