zhihu / ZhiLightLinks
A highly optimized LLM inference acceleration engine for Llama and its variants.
☆900Updated 2 months ago
Alternatives and similar repositories for ZhiLight
Users that are interested in ZhiLight are comparing it to the libraries listed below
Sorting:
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆234Updated last year
- FlagPerf is an open-source software platform for benchmarking AI chips.☆353Updated 2 months ago
- TVM Documentation in Chinese Simplified / TVM 中文文档☆2,398Updated this week
- Unified KV Cache Compression Methods for Auto-Regressive Models☆1,249Updated 9 months ago
- Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+…☆155Updated 4 months ago
- adds Sequence Parallelism into LLaMA-Factory☆564Updated last week
- ☆504Updated 3 weeks ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆874Updated this week
- Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming mo…☆167Updated 4 months ago
- Train your Agent model via our easy and efficient framework☆1,532Updated last week
- A scalable, end-to-end training pipeline for general-purpose agents☆359Updated 3 months ago
- ☆870Updated last week
- DLRover: An Automatic Distributed Deep Learning System☆1,559Updated last week
- ☆70Updated 11 months ago
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆93Updated 11 months ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆265Updated last month
- minimal-cost for training 0.5B R1-Zero☆771Updated 4 months ago
- FlagScale is a large model toolkit based on open-sourced projects.☆358Updated this week
- Awesome LLMs on Device: A Comprehensive Survey☆1,216Updated 8 months ago
- xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism☆2,305Updated this week
- UltraScale Playbook 中文版☆77Updated 6 months ago
- [ICLR 2025🔥] SVD-LLM & [NAACL 2025🔥] SVD-LLM V2☆253Updated last month
- ☆75Updated 10 months ago
- GLake: optimizing GPU memory management and IO transmission.☆480Updated 6 months ago
- Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature se…☆73Updated this week
- A unified end-to-end machine intelligence platform☆538Updated last year
- A powerful toolkit for compressing large models including LLM, VLM, and video generation models.☆576Updated last month
- ☆63Updated 3 weeks ago
- ☆318Updated last month
- [COLM’25] DeepRetrieval — 🔥 The First Search Agent Trained by On-Policy Reinforcement Learning☆641Updated 3 months ago