jd-opensource / xllmLinks
A high-performance inference engine for LLMs, optimized for diverse AI accelerators.
☆801Updated this week
Alternatives and similar repositories for xllm
Users that are interested in xllm are comparing it to the libraries listed below
Sorting:
- Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.☆126Updated last month
- AI Infra主要是指AI的基础建设,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术。☆255Updated last year
- ☆515Updated 3 weeks ago
- KV cache store for distributed LLM inference☆372Updated last month
- ☆75Updated last year
- GLake: optimizing GPU memory management and IO transmission.☆491Updated 8 months ago
- Materials for learning SGLang☆682Updated 2 weeks ago
- Efficient and easy multi-instance LLM serving☆517Updated 3 months ago
- FlagPerf is an open-source software platform for benchmarking AI chips.☆353Updated last month
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆701Updated 2 weeks ago
- SGLang kernel library for NPU☆81Updated this week
- FlagScale is a large model toolkit based on open-sourced projects.☆421Updated last week
- TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.☆98Updated 2 years ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆940Updated last week
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆270Updated 4 months ago
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆539Updated this week
- PyTorch distributed training acceleration framework☆53Updated 4 months ago
- A flexible serving framework that delivers efficient and fault-tolerant LLM inference for clustered deployments.☆80Updated last month
- ☆73Updated last year
- Disaggregated serving system for Large Language Models (LLMs).☆749Updated 8 months ago
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,198Updated 3 months ago
- Accelerate inference without tears☆370Updated 3 weeks ago
- Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature se…☆92Updated this week
- ☆328Updated last month
- This repository organizes materials, recordings, and schedules related to AI-infra learning meetings.☆262Updated 2 weeks ago
- Persist and reuse KV Cache to speedup your LLM.☆158Updated this week
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆282Updated 3 months ago
- A framework for efficient model inference with omni-modality models☆766Updated this week
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆118Updated 6 months ago
- Perplexity GPU Kernels☆536Updated last month