MooreThreads / tutorial_on_musaLinks
☆38Updated 2 months ago
Alternatives and similar repositories for tutorial_on_musa
Users that are interested in tutorial_on_musa are comparing it to the libraries listed below
Sorting:
- torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics c…☆440Updated 2 weeks ago
 - llm-export can export llm model to onnx.☆320Updated last week
 - 大模型部署实战:TensorRT-LLM, Triton Inference Server, vLLM☆26Updated last year
 - Run generative AI models in sophgo BM1684X/BM1688☆253Updated this week
 - A high-throughput and memory-efficient inference and serving engine for LLMs☆65Updated last year
 - 这个项目介绍了简单的CUDA入门,涉及到CUDA执行模型、线程层次、CUDA内存模型、核函数的编写方式以及PyTorch使用CUDA扩展的两种方式。通过该项目可以基本入门基于PyTorch的CUDA扩展的开发方式。☆94Updated 3 years ago
 - Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆449Updated this week
 - Large Language Model Onnx Inference Framework☆36Updated this week
 - A light llama-like llm inference framework based on the triton kernel.☆160Updated last month
 - Parallel Prefix Sum (Scan) with CUDA☆27Updated last year
 - DeepSparkHub selects hundreds of application algorithms and models, covering various fields of AI and general-purpose computing, to suppo…☆69Updated last week
 - A tutorial for CUDA&PyTorch☆159Updated 9 months ago
 - ppl.cv is a high-performance image processing library of openPPL supporting various platforms.☆511Updated last year
 - Machine learning compiler based on MLIR for Sophgo TPU.☆812Updated this week
 - PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)☆97Updated this week
 - ☆59Updated 11 months ago
 - This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…☆1,173Updated 2 years ago
 - run ChatGLM2-6B in BM1684X☆50Updated last year
 - Serving Inside Pytorch☆163Updated last month
 - GPU高性能编程CUDA实战随书代码☆40Updated 3 years ago
 - ☆301Updated 3 years ago
 - 《CUDA编程基础与实践》一书的代码☆139Updated 3 years ago
 - LLaMa/RWKV onnx models, quantization and testcase☆367Updated 2 years ago
 - Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration☆71Updated 5 months ago
 - An Easy-to-Use and High-Performance AI Deployment Framework | 一款简单易用且高性能的AI部署框架☆1,246Updated this week
 - PaddlePaddle Code Convert Toolkit. 『飞桨』深度学习代码转换工具☆117Updated last week
 - ☆51Updated last year
 - llama 2 Inference☆43Updated 2 years ago
 - 大规模并行处理器编程实战 第二版答案☆33Updated 3 years ago
 - FlagPerf is an open-source software platform for benchmarking AI chips.☆352Updated 2 weeks ago