vllm-project/vllm-ascend

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/vllm-project/vllm-ascend)

vllm-project / vllm-ascend

Community maintained hardware plugin for vLLM on Ascend

☆2,442

Alternatives and similar repositories for vllm-ascend

Users that are interested in vllm-ascend are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sgl-project / sgl-kernel-npu
View on GitHub
SGLang kernel library for NPU
☆166Updated this week
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆5,925Updated this week
vllm-project / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆86,727Updated this week
AISBench / benchmark
View on GitHub
AISBench Benchmark is a model evaluation tool built on OpenCompass, compatible with OpenCompass’s configuration system, dataset structure…
☆159Updated this week
LMCache / LMCache-Ascend
View on GitHub
LMCache on Ascend
☆82Jul 14, 2026Updated last week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,545Updated this week
tile-ai / tilelang-ascend
View on GitHub
Ascend TileLang adapter
☆334Updated this week
omni-ai-npu / omni-infer
View on GitHub
Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature se…
☆127Updated this week
vllm-project / vllm-omni
View on GitHub
A framework for efficient model inference with omni-modality models
☆5,631Updated this week
xLLM-AI / xllm
View on GitHub
A high-performance inference engine for LLM, VLM, DiT and REC models, optimized for diverse AI accelerators. It is hosted in OpenAtom Fou…
☆1,478Updated this week
GeeeekExplorer / nano-vllm
View on GitHub
Nano vLLM
☆14,557Apr 26, 2026Updated 2 months ago
Ascend / triton-ascend
View on GitHub
Triton adapter for Ascend. Mirror of https://gitcode.com/ascend/triton-ascend
☆127May 18, 2026Updated 2 months ago
Ascend / pytorch
View on GitHub
Ascend PyTorch adapter (torch_npu). Mirror of https://gitcode.com/Ascend/pytorch
☆551Updated this week
gpustack / gpustack
View on GitHub
A GPU cluster manager for high-performance AI model serving (vLLM, SGLang) and on-demand SSH-accessible GPU instances.
☆5,351Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
modelscope / evalscope
View on GitHub
A streamlined and customizable framework for efficient large model (LLM, VLM, AIGC) evaluation and performance benchmarking.
☆3,110Updated this week
InternLM / lmdeploy
View on GitHub
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
☆7,965Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆5,988Updated this week
verl-project / verl
View on GitHub
verl/HybridFlow: A Flexible and Efficient RL Post-Training Framework
☆22,571Updated this week
maoxx241 / vllm-ascend-workspace
View on GitHub
☆26Jul 1, 2026Updated 2 weeks ago
modelscope / ms-swift
View on GitHub
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL…
☆14,866Updated this week
tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆6,674Updated this week
ascend-ai-coding / awesome-ascend-skills
View on GitHub
A comprehensive knowledge base for Huawei Ascend NPU development, structured as distributed Agent Skills. https://ascend-ai-coding.github…
☆138Updated this week
cosdt / vllm-ascend
View on GitHub
See vLLM official support: https://github.com/vllm-project/vllm-ascend
☆11Feb 5, 2025Updated last year
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
LMCache / LMCache
View on GitHub
LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
☆10,732Updated this week
vllm-project / llm-compressor
View on GitHub
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
☆3,562Updated this week
NVIDIA / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆17,125Updated this week
CalvinXKY / InfraTech
View on GitHub
分享AI Infra知识&代码练习：PyTorch、vLLM/SGLang、slime/vime框架入门⚡️、性能加速🚀、大模型基础🧠、AI软硬件🔧等
☆2,996Jul 2, 2026Updated 2 weeks ago
kvcache-ai / ktransformers
View on GitHub
A Flexible Framework for Experiencing Heterogeneous LLM Inference/Fine-tune Optimizations
☆18,731Updated this week
Ascend / cann-container-image
View on GitHub
Dockerfiles for Ascend CANN
☆64Updated this week
ai-dynamo / dynamo
View on GitHub
A Datacenter Scale Distributed Inference Serving Framework
☆7,540Updated this week
vllm-project / vime
View on GitHub
An LLM post-training framework with vLLM for RL Scaling
☆378Updated this week
hiyouga / LlamaFactory
View on GitHub
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
☆73,397Updated this week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Dao-AILab / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆24,497Updated this week
alibaba / rtp-llm
View on GitHub
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
☆1,282Updated this week
triton-lang / triton-ascend
View on GitHub
Triton language and compiler for Ascend NPU
☆109Updated this week
vllm-project / router
View on GitHub
A high-performance and light-weight router for vLLM large scale deployment
☆320Jul 13, 2026Updated last week
ModelEngine-Group / unified-cache-management
View on GitHub
Persist and reuse KV Cache to speedup your LLM.
☆302Updated this week
xorbitsai / inference
View on GitHub
Swap GPT for any LLM by changing a single line of code. Xinference lets you run open-source, speech, and multimodal models on cloud, on-p…
☆9,440Updated this week
tile-ai / TileRT
View on GitHub
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
☆1,573Updated this week