MetaX-MACA/vLLM-metax

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/MetaX-MACA/vLLM-metax)

MetaX-MACA / vLLM-metax

Community maintained hardware plugin for vLLM on MetaX GPU

☆159

Alternatives and similar repositories for vLLM-metax

Users that are interested in vLLM-metax are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MooreThreads / vllm-musa
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆109Updated this week
kvcache-ai / kvcache-blog
View on GitHub
☆19Updated this week
Cambricon / vllm-mlu
View on GitHub
☆115May 11, 2026Updated 2 months ago
tile-ai / tilelang-benchmark
View on GitHub
☆22Jun 10, 2026Updated last month
vllm-project / vllm-ascend
View on GitHub
Community maintained hardware plugin for vLLM on Ascend
☆2,478Updated this week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
MooreThreads / MT-flashMLA
View on GitHub
Fork from https://github.com/deepseek-ai/FlashMLA
☆17Feb 26, 2025Updated last year
copilot-io / runtime-copilot
View on GitHub
The main purpose of runtime copilot is to assist with node runtime management tasks such as configuring registries, upgrading versions, i…
☆13May 16, 2023Updated 3 years ago
difey / nano-vllm-v1
View on GitHub
Nano vLLM v1 engine
☆16Aug 6, 2025Updated 11 months ago
luskits / luscsi
View on GitHub
Provides deploy scripts and CSI for Lustre.
☆14Apr 13, 2026Updated 3 months ago
ai-dynamo / nixl
View on GitHub
NVIDIA Inference Xfer Library (NIXL)
☆1,151Updated this week
bxttttt / getting-started-guide-and-introduction-to-MXMACA
View on GitHub
MXMACA入门materials
☆22Jun 9, 2024Updated 2 years ago
taco-project / FlexKV
View on GitHub
☆307Updated this week
fsword73 / HIP-Performance-Optmization-on-VEGA64
View on GitHub
14 basic topics for VEGA64 performance optmization
☆66Mar 18, 2021Updated 5 years ago
DeepLink-org / DLBlas
View on GitHub
DLBlas: clean and efficient kernels
☆44Updated this week
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
vllm-project / speculators
View on GitHub
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆652Updated this week
knoway-dev / knoway
View on GitHub
An Envoy inspired, ultimate LLM-first gateway for LLM serving and downstream application developers and enterprises
☆27Apr 24, 2025Updated last year
DaoCloud / dce-charts-repackage
View on GitHub
helm repo add daocloud https://daocloud.github.io/dce-charts-repackage/
☆12Updated this week
vllm-project / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆133Updated this week
spidernet-io / spiderdoctor
View on GitHub
☆11Jun 15, 2026Updated last month
flagos-ai / FlagGems
View on GitHub
FlagGems is an operator library for large language models implemented in the Triton Language.
☆1,057Updated this week
zRzRzRzRzRzRzR / lm-fly
View on GitHub
大模型推理框架加速，让 LLM 飞起来
☆24May 10, 2024Updated 2 years ago
TUE-EE-ES / HalideAutoGPU
View on GitHub
☆11Sep 14, 2020Updated 5 years ago
volcano-sh / devices
View on GitHub
Device plugins for Volcano, e.g. GPU
☆137Mar 20, 2025Updated last year
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
flagos-ai / vllm-plugin-FL
View on GitHub
A vLLM plugin built on the FlagOS unified multi-chip backend.
☆66Updated this week
SJTU-DENG-Lab / Diffulex
View on GitHub
Flexible and Pluggable Serving Engine for Diffusion LLMs
☆148Jul 13, 2026Updated last week
DeepLink-org / dlinfer
View on GitHub
☆74Updated this week
Project-HAMi / HAMi
View on GitHub
Heterogeneous GPU Sharing on Kubernetes
☆4,064Updated this week
Cambricon / torch_mlu
View on GitHub
☆57Mar 15, 2025Updated last year
MooreThreads / SimuMax
View on GitHub
a static analytical model for LLM distributed training
☆163May 11, 2026Updated 2 months ago
lucifer1004 / VeloQ
View on GitHub
Agent-friendly GPU profile-query CLI
☆106Jun 22, 2026Updated last month
openvino-dev-samples / decode-infer-on-GPU
View on GitHub
This sample shows how to use the oneAPI Video Processing Library (oneVPL) to perform a single and multi-source video decode and preproces…
☆15Jun 15, 2023Updated 3 years ago
OrderLab / orbit
View on GitHub
Orbit: OS Support for Safe and Efficient Auxiliary Tasks in Applications
☆22May 23, 2022Updated 4 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
kubean-io / kube-node-tuning
View on GitHub
Manage kubernetes node-level kernel tuning ( using sysctl ).
☆30Nov 21, 2025Updated 8 months ago
leon0514 / ml-depth-pro-trt10
View on GitHub
使用C++ tensorrt10 推理 ml-depth-pro 模型
☆23Jan 8, 2025Updated last year
xLLM-AI / xllm
View on GitHub
A high-performance inference engine for LLM, VLM, DiT and REC models, optimized for diverse AI accelerators. It is hosted in OpenAtom Fou…
☆1,493Updated this week
InternLM / Kernel-Smith
View on GitHub
☆27Mar 31, 2026Updated 3 months ago
serdes21 / flashtile
View on GitHub
FlashTile is a CUDA Tile IR compiler that is compatible with NVIDIA's tileiras, targeting SM70 through SM121 NVIDIA GPUs.
☆61Feb 6, 2026Updated 5 months ago
tile-ai / TileOPs
View on GitHub
High-performance LLM operator library built on TileLang.
☆163Updated this week
sammysun0711 / ov_llm_bench
View on GitHub
OpenVINO LLM Benchmark
☆11Dec 7, 2023Updated 2 years ago