MooreThreads / tutorial_on_musaLinks
☆29Updated last week
Alternatives and similar repositories for tutorial_on_musa
Users that are interested in tutorial_on_musa are comparing it to the libraries listed below
Sorting:
- torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics c…☆414Updated this week
- A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface☆109Updated 3 months ago
- MUSA Templates for Linear Algebra Subroutines☆27Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆51Updated 8 months ago
- CPU Memory Compiler and Parallel programing☆26Updated 7 months ago
- DeepSparkHub selects hundreds of application algorithms and models, covering various fields of AI and general-purpose computing, to suppo…☆64Updated last week
- A tutorial for CUDA&PyTorch☆146Updated 5 months ago
- μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updatin…☆179Updated 3 weeks ago
- related to virglrender-vulkan: basic compute test application☆15Updated last year
- Serving Inside Pytorch☆160Updated 2 weeks ago
- A Connected Component Labelling algorithm implemented in CUDA☆48Updated 3 years ago
- A light llama-like llm inference framework based on the triton kernel.☆128Updated last week
- ppl.cv is a high-performance image processing library of openPPL supporting various platforms.☆504Updated 7 months ago
- stable diffusion using mnn☆65Updated last year
- ☆284Updated 3 years ago
- ☆69Updated last week
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆377Updated this week
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆582Updated 2 weeks ago
- A large number of cuda/tensorrt cases . 大量案例来学习cuda/tensorrt☆135Updated 2 years ago
- 大模型部署实战:TensorRT-LLM, Triton Inference Server, vLLM☆26Updated last year
- ☆120Updated 2 years ago
- 《CUDA编程基础与实践》一书的代码☆123Updated 3 years ago
- OpenCL Tutorials☆53Updated 5 years ago
- A simple high performance CUDA GEMM implementation.☆382Updated last year
- A CPU tool for benchmarking the peak of floating points☆548Updated last month
- A CPU 3D Reconstruction pipeline using COLMAP and OpenMVS☆14Updated 2 years ago
- QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …☆50Updated this week
- ☆37Updated 8 months ago
- GLake: optimizing GPU memory management and IO transmission.☆467Updated 3 months ago
- Examples for HIP☆208Updated 6 months ago