MooreThreads / tutorial_on_musaLinks
☆43Updated 3 weeks ago
Alternatives and similar repositories for tutorial_on_musa
Users that are interested in tutorial_on_musa are comparing it to the libraries listed below
Sorting:
- torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics c…☆475Updated this week
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆483Updated last week
- llm-export can export llm model to onnx.☆344Updated 3 months ago
- Run generative AI models in sophgo BM1684X/BM1688☆266Updated 2 weeks ago
- PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)☆101Updated this week
- A tutorial for CUDA&PyTorch☆227Updated last week
- MUSA Templates for Linear Algebra Subroutines☆41Updated last week
- Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration☆79Updated 8 months ago
- ☆43Updated 4 years ago
- This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…☆1,233Updated 2 years ago
- tutorial for writing custom pytorch cpp+cuda kernel, applied on volume rendering (NeRF)☆29Updated 2 years ago
- Serving Inside Pytorch☆170Updated 2 weeks ago
- LLaMa/RWKV onnx models, quantization and testcase☆367Updated 2 years ago
- A CUDA tutorial to make people learn CUDA program from 0☆266Updated last year
- Low-bit LLM inference on CPU/NPU with lookup table☆916Updated 8 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆76Updated last year
- QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …☆111Updated this week
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆682Updated last week
- a lightweight LLM model inference framework☆749Updated last year
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆1,037Updated last week
- C++ implementation of Qwen-LM☆616Updated last year
- ☆314Updated 3 years ago
- DeepSparkHub selects hundreds of application algorithms and models, covering various fields of AI and general-purpose computing, to suppo…☆70Updated last week
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆520Updated last year
- learning how CUDA works☆373Updated 11 months ago
- [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.☆672Updated 2 months ago
- NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.☆225Updated last year
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆274Updated 6 months ago
- ☆26Updated 5 months ago
- A light llama-like llm inference framework based on the triton kernel.☆171Updated last month