MooreThreads / torch_musa
torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics cards.
☆376Updated last month
Alternatives and similar repositories for torch_musa:
Users that are interested in torch_musa are comparing it to the libraries listed below
- a lightweight LLM model inference framework☆722Updated 11 months ago
- ☆118Updated last year
- Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch☆320Updated this week
- PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)☆82Updated this week
- llm-export can export llm model to onnx.☆274Updated 2 months ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆240Updated 3 weeks ago
- FlagGems is an operator library for large language models implemented in Triton Language.☆467Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆46Updated 5 months ago
- MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器☆482Updated 5 months ago
- A CPU tool for benchmarking the peak of floating points☆531Updated 5 months ago
- ☆139Updated 11 months ago
- ☆30Updated last year
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆471Updated last year
- Run generative AI models in sophgo BM1684X☆193Updated this week
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆443Updated this week
- This is an inference framework for the RWKV large language model implemented purely in native PyTorch. The official native implementation…☆127Updated 8 months ago
- ☆127Updated 3 months ago
- 📚FFPA(Split-D): Yet another Faster Flash Prefill Attention with O(1) GPU SRAM complexity for headdim > 256, ~2x↑🎉vs SDPA EA.☆157Updated last week
- Machine learning compiler based on MLIR for Sophgo TPU.☆698Updated last week
- Low-bit LLM inference on CPU with lookup table☆705Updated 2 months ago
- The CUDA version of the RWKV language model ( https://github.com/BlinkDL/RWKV-LM )☆221Updated 3 months ago
- ☆410Updated last week
- GLake: optimizing GPU memory management and IO transmission.☆451Updated last week
- export llama to onnx☆120Updated 3 months ago
- Compiler Infrastructure for Neural Networks☆145Updated last year
- LLaMa/RWKV onnx models, quantization and testcase☆359Updated last year
- 本项目是一个通过文字生成图片的项目,基于开源模型Stable Diffusion V1.5生成可以在手机的CPU和NPU上运行的模型,包括其配套的模型运行框架。☆149Updated last year
- llm deploy project based mnn. This project has merged into MNN.☆1,573Updated 2 months ago
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆60Updated last week
- cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it☆534Updated last week