MooreThreads / tutorial_on_musaLinks

☆40

Alternatives and similar repositories for tutorial_on_musa

Users that are interested in tutorial_on_musa are comparing it to the libraries listed below

Sorting:

MooreThreads / torch_musa
torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics c…
☆449Updated last week
Ascend / pytorch
Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch
☆458Updated last week
PaddlePaddle / PaddleCustomDevice
PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
☆100Updated this week
NVIDIA / nvImageCodec
A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface
☆126Updated 3 months ago
leimao / TensorRT-Custom-Plugin-Example
Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration
☆73Updated 6 months ago
Deep-Spark / DeepSparkHub
DeepSparkHub selects hundreds of application algorithms and models, covering various fields of AI and general-purpose computing, to suppo…
☆69Updated 2 weeks ago
sophgo / LLM-TPU
Run generative AI models in sophgo BM1684X/BM1688
☆254Updated this week
inisis / OnnxLLM
Large Language Model Onnx Inference Framework
☆36Updated this week
torchpipe / torchpipe
Serving Inside Pytorch
☆165Updated last week
CalvinXKY / BasicCUDA
A tutorial for CUDA&PyTorch
☆168Updated 10 months ago
NVIDIA / cudnn-frontend
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
☆648Updated last week
intel / xFasterTransformer
☆431Updated 2 months ago
PaddlePaddle / PaConvert
PaddlePaddle Code Convert Toolkit. 『飞桨』深度学习代码转换工具
☆118Updated this week
OpenPPL / ppl.cv
ppl.cv is a high-performance image processing library of openPPL supporting various platforms.
☆512Updated last year
wangzhaode / llm-export
llm-export can export llm model to onnx.
☆330Updated last month
OpenPPL / ppl.pmx
☆60Updated last year
MooreThreads / vllm_musa
A high-throughput and memory-efficient inference and serving engine for LLMs
☆69Updated last year
DeepLink-org / deeplink.framework
☆72Updated last year
MooreThreads / mutlass
MUSA Templates for Linear Algebra Subroutines
☆34Updated 9 months ago
quic / ai-engine-direct-helper
QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …
☆87Updated last week
mingj2021 / segment-anything-tensorrt
☆79Updated 2 years ago
harleyszhang / lite_llama
A light llama-like llm inference framework based on the triton kernel.
☆165Updated 2 months ago
Deep-Spark / DeepSpark
The DeepSpark open platform selects hundreds of open source application algorithms and models that are deeply coupled with industrial app…
☆45Updated this week
ChambinLee / CUDA_with_PyTorch
这个项目介绍了简单的CUDA入门，涉及到CUDA执行模型、线程层次、CUDA内存模型、核函数的编写方式以及PyTorch使用CUDA扩展的两种方式。通过该项目可以基本入门基于PyTorch的CUDA扩展的开发方式。
☆94Updated 4 years ago
Liu-xiandong / How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…
☆1,188Updated 2 years ago
HeKun-NVIDIA / TensorRT-Developer_Guide_in_Chinese
☆305Updated 3 years ago
DataXujing / TensorRT-LLM-ChatGLM3
大模型部署实战：TensorRT-LLM, Triton Inference Server, vLLM
☆26Updated last year
LitLeo / OpenCUDA
☆267Updated 7 years ago
ischintsan / cuda_by_example
GPU高性能编程CUDA实战随书代码
☆44Updated 3 years ago
tpoisonooo / llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
☆368Updated 2 years ago