jd-opensource / xllmLinks

A high-performance inference engine for LLMs, optimized for diverse AI accelerators.

☆606

Alternatives and similar repositories for xllm

Users that are interested in xllm are comparing it to the libraries listed below

Sorting:

chenzomi12 / DeepLearningSystem
AI Infra主要是指AI的基础建设，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术。
☆250Updated last year
antgroup / glake
GLake: optimizing GPU memory management and IO transmission.
☆486Updated 7 months ago
bytedance / InfiniStore
KV cache store for distributed LLM inference
☆351Updated last month
sgl-project / sgl-learning-materials
Materials for learning SGLang
☆626Updated last week
DeepLink-org / deeplink.framework
☆72Updated last year
bytedance / flux
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
☆1,161Updated 2 months ago
sgl-project / SpecForge
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆451Updated this week
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆504Updated 2 months ago
alibaba / rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
☆910Updated this week
DeepLink-org / DIOPI
☆75Updated 11 months ago
LLMServe / DistServe
Disaggregated serving system for Large Language Models (LLMs).
☆713Updated 6 months ago
Tencent / KsanaLLM
☆508Updated last month
FlagOpen / FlagGems
FlagGems is an operator library for large language models implemented in the Triton Language.
☆745Updated this week
AlibabaPAI / torchacc
PyTorch distributed training acceleration framework
☆53Updated 2 months ago
alibaba / TePDist
TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.
☆97Updated 2 years ago
ByteDance-Seed / Triton-distributed
Distributed Compiler based on Triton for Parallel Systems
☆1,206Updated 2 weeks ago
ovg-project / kvcached
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
☆469Updated last week
sgl-project / sgl-kernel-npu
SGLang kernel library for NPU
☆65Updated this week
perplexityai / pplx-kernels
Perplexity GPU Kernels
☆513Updated last week
FlagOpen / FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
☆364Updated last week
madsys-dev / deepseekv2-profile
☆149Updated 7 months ago
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆114Updated 5 months ago
stepfun-ai / StepMesh
☆309Updated last month
SiriusNEO / Triton-Puzzles-Lite
Puzzles for learning Triton, play it with minimal environment configuration!
☆553Updated last month
FlagOpen / FlagPerf
FlagPerf is an open-source software platform for benchmarking AI chips.
☆352Updated 2 weeks ago
ai-dynamo / nixl
NVIDIA Inference Xfer Library (NIXL)
☆688Updated this week
bytedance / ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…
☆265Updated 2 months ago
microsoft / vattention
Dynamic Memory Management for Serving LLMs without PagedAttention
☆432Updated 5 months ago
hahnyuan / LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline mod…
☆568Updated last year
FlagOpen / FlagCX
☆91Updated last week