zhihu / ZhiLightLinks
A highly optimized LLM inference acceleration engine for Llama and its variants.
☆898Updated 2 weeks ago
Alternatives and similar repositories for ZhiLight
Users that are interested in ZhiLight are comparing it to the libraries listed below
Sorting:
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆227Updated 9 months ago
- FlagPerf is an open-source software platform for benchmarking AI chips.☆344Updated 3 weeks ago
- TVM Documentation in Chinese Simplified / TVM 中文文档☆1,990Updated 2 months ago
- Unified KV Cache Compression Methods for Auto-Regressive Models☆1,190Updated 6 months ago
- adds Sequence Parallelism into LLaMA-Factory☆525Updated this week
- Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+…☆145Updated last month
- Train your Agent model via our easy and efficient framework☆1,258Updated this week
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆809Updated last month
- ☆455Updated this week
- A scalable, end-to-end training pipeline for general-purpose agents☆258Updated last week
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆259Updated last month
- [ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2☆228Updated 3 months ago
- GLake: optimizing GPU memory management and IO transmission.☆470Updated 3 months ago
- ☆67Updated 8 months ago
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆91Updated 8 months ago
- FlagScale is a large model toolkit based on open-sourced projects.☆321Updated this week
- DLRover: An Automatic Distributed Deep Learning System☆1,500Updated this week
- xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism☆2,105Updated 2 weeks ago
- Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming mo…☆164Updated 2 months ago
- Materials for learning SGLang☆475Updated this week
- minimal-cost for training 0.5B R1-Zero☆748Updated last month
- Awesome LLMs on Device: A Comprehensive Survey☆1,149Updated 6 months ago
- ☆72Updated 7 months ago
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆510Updated this week
- Accelerate inference without tears☆319Updated 3 months ago
- UltraScale Playbook 中文版☆45Updated 3 months ago
- [ICML 2025 Spotlight] ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference☆205Updated 2 months ago
- ☆53Updated last week
- Disaggregated serving system for Large Language Models (LLMs).☆639Updated 3 months ago
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆1,002Updated this week