zhihu / ZhiLightLinks
A highly optimized LLM inference acceleration engine for Llama and its variants.
☆906Updated 5 months ago
Alternatives and similar repositories for ZhiLight
Users that are interested in ZhiLight are comparing it to the libraries listed below
Sorting:
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆238Updated last year
- TVM Documentation in Chinese Simplified / TVM 中文文档☆2,893Updated last month
- Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+…☆160Updated 2 weeks ago
- Unified KV Cache Compression Methods for Auto-Regressive Models☆1,291Updated 11 months ago
- FlagPerf is an open-source software platform for benchmarking AI chips.☆355Updated last month
- UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g…☆1,137Updated this week
- adds Sequence Parallelism into LLaMA-Factory☆600Updated 2 months ago
- ☆966Updated this week
- Train your Agent model via our easy and efficient framework☆1,668Updated 3 weeks ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆949Updated this week
- Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming mo…☆168Updated 7 months ago
- DLRover: An Automatic Distributed Deep Learning System☆1,608Updated last week
- A scalable, end-to-end training pipeline for general-purpose agents☆362Updated 5 months ago
- ☆518Updated last month
- [ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2☆270Updated 3 months ago
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆94Updated last year
- minimal-cost for training 0.5B R1-Zero☆793Updated 7 months ago
- ☆73Updated last year
- ☆201Updated 3 months ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆270Updated 4 months ago
- GLake: optimizing GPU memory management and IO transmission.☆491Updated 9 months ago
- ☆77Updated last year
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆275Updated 7 months ago
- ☆774Updated last month
- UltraScale Playbook 中文版☆104Updated 9 months ago
- [Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models☆1,159Updated 2 months ago
- FlagScale is a large model toolkit based on open-sourced projects.☆426Updated last week
- An Innovative Agent Framework Driven by KG Engine☆773Updated 11 months ago
- Materials for learning SGLang☆693Updated last week
- [ICML 2025] "SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator"☆557Updated 4 months ago