Cambricon/vllm-mlu

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Cambricon/vllm-mlu)

Cambricon / vllm-mlu

☆114

Alternatives and similar repositories for vllm-mlu

Users that are interested in vllm-mlu are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Cambricon / torch_mlu
View on GitHub
☆57Mar 15, 2025Updated last year
MetaX-MACA / vLLM-metax
View on GitHub
Community maintained hardware plugin for vLLM on MetaX GPU
☆154Updated this week
Cambricon / magicmind_cloud
View on GitHub
☆16Nov 28, 2023Updated 2 years ago
PASSIONLab / distributed_sddmm
View on GitHub
Distributed SDDMM Kernel
☆12Jul 8, 2022Updated 4 years ago
xLLM-AI / xllm-service
View on GitHub
A flexible serving framework that delivers efficient and fault-tolerant LLM inference for clustered deployments.
☆95Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Cambricon / catch
View on GitHub
☆33Apr 20, 2023Updated 3 years ago
Cambricon / CNStream
View on GitHub
CNStream is a streaming framework for building Cambricon machine learning pipelines http://forum.cambricon.com https://gitee.com/Solu…
☆55Mar 21, 2025Updated last year
foundation-model-stack / vllm-triton-backend
View on GitHub
A Triton-only attention backend for vLLM
☆27Jul 14, 2026Updated last week
Infrasys-AI / aiinfra-docs
View on GitHub
☆21Nov 6, 2025Updated 8 months ago
CalvinXKY / EPLB_visualization
View on GitHub
Visualize the Expert Parallelism Load Balancer
☆19Mar 15, 2025Updated last year
daochenzha / neuroshard
View on GitHub
[MLSys 2023] Pre-train and Search: Efficient Embedding Table Sharding with Pre-trained Neural Cost Models
☆16May 5, 2023Updated 3 years ago
vllm-project / vllm-ascend
View on GitHub
Community maintained hardware plugin for vLLM on Ascend
☆2,450Updated this week
MooreThreads / vllm-musa
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆108Updated this week
albanD / pytorch_dev_env_setup
View on GitHub
☆11Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
NGIOproject / PMTutorial
View on GitHub
Slides and exercises for persistent memory programming tutorial
☆14Nov 14, 2022Updated 3 years ago
KuntaiDu / vllm
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆13Jun 10, 2026Updated last month
chipsalliance / chisel-interface
View on GitHub
The 'missing header' for Chisel
☆24Feb 5, 2026Updated 5 months ago
arnoldlu / common-use
View on GitHub
Will place common tools here to align all tools.
☆10Jun 20, 2019Updated 7 years ago
luka-group / FaviComp
View on GitHub
[EMNLP 2025 Findings] Familiarity-aware Evidence Compression for Retrieval Augmented Generation
☆15Aug 20, 2025Updated 11 months ago
xLLM-AI / xllm
View on GitHub
A high-performance inference engine for LLM, VLM, DiT and REC models, optimized for diverse AI accelerators. It is hosted in OpenAtom Fou…
☆1,479Updated this week
infinigence / FlashOverlap
View on GitHub
A lightweight design for computation-communication overlap.
☆242Jan 20, 2026Updated 6 months ago
OrderLab / orbit
View on GitHub
Orbit: OS Support for Safe and Efficient Auxiliary Tasks in Applications
☆22May 23, 2022Updated 4 years ago
Emma926 / mcbench
View on GitHub
Mille Crepe Bench: layer-wise performance analysis for deep learning frameworks.
☆18Oct 22, 2019Updated 6 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
alibaba / atrex-bench
View on GitHub
End-to-end benchmark for AI-generated GPU kernels, drawn from real production traces — turn a PyTorch reference into a DSL kernel (Triton…
☆15Updated this week
Ascend / AscendSpeed
View on GitHub
☆79Dec 15, 2023Updated 2 years ago
taco-project / FlexKV
View on GitHub
☆305Updated this week
j9650 / MedusaNet
View on GitHub
☆16Jan 5, 2021Updated 5 years ago
Victorwz / LaViA
View on GitHub
☆10Jul 13, 2024Updated 2 years ago
in-ATP / switchML
View on GitHub
☆33Mar 31, 2021Updated 5 years ago
ranggihwang / Pregated_MoE
View on GitHub
☆62May 4, 2024Updated 2 years ago
AlessandroCilardo / NaplesPU
View on GitHub
The official NaplesPU hardware code repository
☆35Jul 27, 2019Updated 6 years ago
ruikangliu / FlatQuant
View on GitHub
[ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"
☆223Nov 25, 2025Updated 7 months ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
Cambricon / triton-linalg
View on GitHub
Development repository for the Triton-Linalg conversion
☆221Feb 7, 2025Updated last year
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,139Updated this week
DeepRec-AI / extension
View on GitHub
DeepRec Extension is an easy-to-use, stable and efficient large-scale distributed training system based on DeepRec.
☆13May 17, 2024Updated 2 years ago
BienLuky / CacheQuant
View on GitHub
[CVPR 2025] The official implementation of "CacheQuant: Comprehensively Accelerated Diffusion Models"
☆48Nov 2, 2025Updated 8 months ago
bojesomo / Weather4cast2021-SwinEncoderDecoder
View on GitHub
☆10Oct 20, 2021Updated 4 years ago
ai-dynamo / dynamo
View on GitHub
A Datacenter Scale Distributed Inference Serving Framework
☆7,540Updated this week
tile-ai / TileRT
View on GitHub
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
☆1,579Jul 14, 2026Updated last week