MooreThreads / torch_musaLinks
torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics cards.
☆469Updated this week
Alternatives and similar repositories for torch_musa
Users that are interested in torch_musa are comparing it to the libraries listed below
Sorting:
- a lightweight LLM model inference framework☆746Updated last year
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆476Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆73Updated last year
- llm-export can export llm model to onnx.☆340Updated 2 months ago
- ☆66Updated last year
- PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)☆101Updated this week
- MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器☆489Updated last year
- Machine learning compiler based on MLIR for Sophgo TPU.☆839Updated last week
- MUSA Templates for Linear Algebra Subroutines☆39Updated 3 weeks ago
- ☆623Updated 3 weeks ago
- A CPU tool for benchmarking the peak of floating points☆571Updated 3 weeks ago
- FlagGems is an operator library for large language models implemented in the Triton Language.☆824Updated this week
- Run generative AI models in sophgo BM1684X/BM1688☆260Updated this week
- [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.☆659Updated last month
- C++ implementation of Qwen-LM☆614Updated last year
- ☆43Updated last month
- Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .☆147Updated 2 weeks ago
- Triton Documentation in Chinese Simplified / Triton 中文文档☆96Updated 3 weeks ago
- ☆518Updated this week
- Ascend TileLang adapter☆177Updated this week
- FlagScale is a large model toolkit based on open-sourced projects.☆463Updated this week
- export llama to onnx☆137Updated last year
- ☆130Updated last year
- A model compilation solution for various hardware☆458Updated 4 months ago
- PyTorch Neural Network eXchange☆665Updated this week
- Low-bit LLM inference on CPU/NPU with lookup table☆906Updated 7 months ago
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆476Updated last year
- ☆435Updated 3 months ago
- ☆71Updated this week
- llama 2 Inference☆43Updated 2 years ago