jd-opensource / xllmLinks
A high-performance inference engine for LLMs, optimized for diverse AI accelerators.
☆934Updated this week
Alternatives and similar repositories for xllm
Users that are interested in xllm are comparing it to the libraries listed below
Sorting:
- Draft-Target Disaggregation LLM Serving System via Parallel Speculative Decoding.☆149Updated last month
- AI Infra主要是指AI的基础建设,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术。☆258Updated last year
- ☆522Updated this week
- Train speculative decoding models effortlessly and port them smoothly to SGLang serving.☆639Updated this week
- KV cache store for distributed LLM inference☆387Updated 2 months ago
- Efficient and easy multi-instance LLM serving☆521Updated 4 months ago
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆760Updated 2 weeks ago
- GLake: optimizing GPU memory management and IO transmission.☆497Updated 10 months ago
- Materials for learning SGLang☆725Updated 3 weeks ago
- Disaggregated serving system for Large Language Models (LLMs).☆768Updated 9 months ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆1,031Updated this week
- This repository organizes materials, recordings, and schedules related to AI-infra learning meetings.☆303Updated 3 weeks ago
- Persist and reuse KV Cache to speedup your LLM.☆240Updated this week
- Tile-Based Runtime for Ultra-Low-Latency LLM Inference☆543Updated last month
- NVIDIA Inference Xfer Library (NIXL)☆844Updated this week
- ☆73Updated last year
- FlagScale is a large model toolkit based on open-sourced projects.☆468Updated this week
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆123Updated last month
- FlagCX is a scalable and adaptive cross-chip communication library.☆170Updated this week
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,231Updated 4 months ago
- Perplexity GPU Kernels☆554Updated 2 months ago
- SGLang kernel library for NPU☆95Updated this week
- ☆340Updated 3 weeks ago
- ☆113Updated 2 weeks ago
- FlagGems is an operator library for large language models implemented in the Triton Language.☆880Updated this week
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆297Updated last week
- A throughput-oriented high-performance serving framework for LLMs☆940Updated 2 months ago
- Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature se…☆101Updated this week
- High performance Transformer implementation in C++.☆148Updated last year
- High Performance LLM Inference Operator Library☆222Updated this week