baidu / vLLM-KunlunLinks
vLLM Kunlun (vllm-kunlun) is a community-maintained hardware plugin designed to seamlessly run vLLM on the Kunlun XPU.
☆212Updated this week
Alternatives and similar repositories for vLLM-Kunlun
Users that are interested in vLLM-Kunlun are comparing it to the libraries listed below
Sorting:
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆995Updated this week
- GLake: optimizing GPU memory management and IO transmission.☆494Updated 9 months ago
- Disaggregated serving system for Large Language Models (LLMs).☆761Updated 9 months ago
- ☆518Updated this week
- This repository organizes materials, recordings, and schedules related to AI-infra learning meetings.☆288Updated this week
- Community maintained hardware plugin for vLLM on Ascend☆1,532Updated this week
- A self-learning tutorail for CUDA High Performance Programing.☆803Updated 6 months ago
- DLRover: An Automatic Distributed Deep Learning System☆1,619Updated this week
- Efficient and easy multi-instance LLM serving☆520Updated 4 months ago
- Byted PyTorch Distributed for Hyperscale Training of LLMs and RLs☆915Updated last month
- LMCache on Ascend☆43Updated last week
- FlagGems is an operator library for large language models implemented in the Triton Language.☆824Updated this week
- Materials for learning SGLang☆714Updated this week
- FlagScale is a large model toolkit based on open-sourced projects.☆463Updated this week
- how to learn PyTorch and OneFlow☆468Updated last year
- Persist and reuse KV Cache to speedup your LLM.☆233Updated this week
- The road to hack SysML and become an system expert☆506Updated last year
- learning how CUDA works☆362Updated 10 months ago
- NVIDIA Inference Xfer Library (NIXL)☆788Updated this week
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆172Updated 2 years ago
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆615Updated this week
- The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.☆1,499Updated 3 weeks ago
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,218Updated 4 months ago
- Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.☆4,538Updated this week
- KV cache store for distributed LLM inference☆384Updated last month
- AI 基础知识 - GPU 架构、CUDA 编程、大模型基础及AI Agent 相关知识☆692Updated this week
- Separate from hardware and used to learn some NCCL mechanisms☆24Updated last year
- ☆77Updated last year
- Offline optimization of your disaggregated Dynamo graph☆137Updated this week
- TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.☆99Updated 2 years ago