zhihu / ZhiLightLinks

A highly optimized LLM inference acceleration engine for Llama and its variants.

☆903

Alternatives and similar repositories for ZhiLight

Users that are interested in ZhiLight are comparing it to the libraries listed below

Sorting:

bytedance / ABQ-LLM
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
☆237Updated last year
hyperai / tvm-cn
TVM Documentation in Chinese Simplified / TVM 中文文档
☆2,647Updated last week
flagos-ai / FlagPerf
FlagPerf is an open-source software platform for benchmarking AI chips.
☆352Updated this week
Zefan-Cai / KVCache-Factory
Unified KV Cache Compression Methods for Auto-Regressive Models
☆1,275Updated 10 months ago
uccl-project / uccl
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g…
☆957Updated this week
NetEase-Media / grps_trtllm
Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+…
☆157Updated 6 months ago
Qihoo360 / 360-LLaMA-Factory
adds Sequence Parallelism into LLaMA-Factory
☆588Updated last month
alibaba / rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
☆920Updated this week
ChenmienTan / RL2
☆915Updated this week
cmriat / l0
A scalable, end-to-end training pipeline for general-purpose agents
☆361Updated 4 months ago
Tencent / KsanaLLM
☆511Updated 2 months ago
intelligent-machine-learning / dlrover
DLRover: An Automatic Distributed Deep Learning System
☆1,586Updated this week
DeepLink-org / deeplink.framework
☆72Updated last year
AIoT-MLSys-Lab / SVD-LLM
[ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2
☆261Updated 2 months ago
Qcompiler / MIXQ
MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction
☆94Updated last year
Simple-Efficient / RL-Factory
Train your Agent model via our easy and efficient framework
☆1,613Updated last week
aliyun / SimAI
☆730Updated 2 weeks ago
FlagOpen / FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
☆404Updated last week
modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆267Updated 3 months ago
antgroup / glake
GLake: optimizing GPU memory management and IO transmission.
☆489Updated 7 months ago
NetEase-Media / grps
Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming mo…
☆167Updated 6 months ago
xdit-project / xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
☆2,390Updated this week
Tencent-BAC / FastMTP
☆196Updated last month
DeepLink-org / DIOPI
☆75Updated 11 months ago
dhcode-cpp / X-R1
minimal-cost for training 0.5B R1-Zero
☆784Updated 6 months ago
DeepLink-org / DeepLinkExt
☆13Updated 5 months ago
ByteDance-Seed / ShadowKV
[ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
☆270Updated 6 months ago
ModelTC / LightCompress
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
☆614Updated this week
InternLM / InternBootcamp
☆322Updated 2 months ago
eunomia-bpf / eGPU
Extending eBPF Programmability and Observability to GPUs (merged into https://github.com/eunomia-bpf/bpftime)
☆266Updated last month