MooreThreads / vllm_musaLinks

A high-throughput and memory-efficient inference and serving engine for LLMs

☆65

Alternatives and similar repositories for vllm_musa

Users that are interested in vllm_musa are comparing it to the libraries listed below

Sorting:

modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆267Updated 2 months ago
sophgo / LLM-TPU
Run generative AI models in sophgo BM1684X/BM1688
☆251Updated last week
wangzhaode / llm-export
llm-export can export llm model to onnx.
☆317Updated last week
Tlntin / qwen-ascend-llm
☆50Updated last year
sophgo / ChatGLM2-TPU
run ChatGLM2-6B in BM1684X
☆50Updated last year
DeepLink-org / dlinfer
☆64Updated last week
Ascend / pytorch
Ascend PyTorch adapter (torch_npu). Mirror of https://gitee.com/ascend/pytorch
☆444Updated last month
OpenPPL / ppl.llm.serving
☆129Updated 10 months ago
DeepLink-org / deeplink.framework
☆70Updated last year
FlagOpen / FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
☆364Updated last week
DeepLink-org / AIChipBenchmark
☆31Updated this week
mindspore-lab / mindformers
☆175Updated this week
OpenPPL / ppl.nn.llm
☆139Updated last year
PaddlePaddle / PaddleCustomDevice
PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
☆97Updated last week
hyperai / triton-cn
Triton Documentation in Chinese Simplified / Triton 中文文档
☆87Updated 6 months ago
pandada8 / llm-inference-benchmark
LLM 推理服务性能测试
☆44Updated last year
Tencent / KsanaLLM
☆508Updated last month
hyperai / vllm-cn
vLLM Documentation in Chinese Simplified / vLLM 中文文档
☆115Updated 2 weeks ago
ModelTC / LightCompress
A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
☆599Updated 2 months ago
luchangli03 / export_llama_to_onnx
export llama to onnx
☆136Updated 10 months ago
OpenBMB / CPM.cu
CPM.cu is a lightweight, high-performance CUDA implementation for LLMs, optimized for end-device inference and featuring cutting-edge tec…
☆201Updated 3 weeks ago
OpenPPL / ppl.pmx
☆59Updated 11 months ago
shen-shanshan / cs-self-learning
This repo is used for archiving my notes, codes and materials of cs learning.
☆58Updated last week
BestAnHongjun / LMDeploy-Jetson
Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function ind…
☆102Updated last year
intel / xFasterTransformer
☆430Updated last month
QwenLM / qwen.cpp
C++ implementation of Qwen-LM
☆606Updated 10 months ago
MooreThreads / torch_musa
torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics c…
☆437Updated last week
tpoisonooo / llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
☆367Updated 2 years ago
feifeibear / LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
☆116Updated last year
MegEngine / InferLLM
a lightweight LLM model inference framework
☆738Updated last year