zhihu / ZhiLight
A highly optimized LLM inference acceleration engine for Llama and its variants.
☆885Updated this week
Alternatives and similar repositories for ZhiLight:
Users that are interested in ZhiLight are comparing it to the libraries listed below
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆221Updated 6 months ago
- TVM Documentation in Chinese Simplified / TVM 中文文档☆940Updated last week
- Unified KV Cache Compression Methods for Auto-Regressive Models☆1,009Updated 3 months ago
- adds Sequence Parallelism into LLaMA-Factory☆461Updated this week
- Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+…☆130Updated this week
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆702Updated 2 months ago
- ☆326Updated 3 months ago
- FlagPerf is an open-source software platform for benchmarking AI chips.☆328Updated 2 months ago
- minimal-cost for training 0.5B R1-Zero☆699Updated 3 weeks ago
- xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism☆1,826Updated last week
- DeepRetrieval - Hacking 🔥Real Search Engines and Retrievers with LLM via RL☆372Updated this week
- FlagScale is a large model toolkit based on open-sourced projects.☆265Updated this week
- GLake: optimizing GPU memory management and IO transmission.☆456Updated 3 weeks ago
- A self-learning tutorail for CUDA High Performance Programing.☆590Updated last week
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆242Updated this week
- Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming mo…☆157Updated 3 weeks ago
- 使用deepspeed从头开始训练一个LLM,经过pretrain和sft阶段,验证llm学习知识、理解语言、回答问题的能力☆151Updated 9 months ago
- DLRover: An Automatic Distributed Deep Learning System☆1,411Updated this week
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆87Updated 5 months ago
- ☆474Updated last week
- ☆66Updated 5 months ago
- LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.☆722Updated this week
- An Innovative Agent Framework Driven by KG Engine☆758Updated 3 months ago
- ☆68Updated 4 months ago
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆887Updated this week
- Align Anything: Training All-modality Model with Feedback☆3,386Updated this week
- Disaggregated serving system for Large Language Models (LLMs).☆559Updated 2 weeks ago
- FlagGems is an operator library for large language models implemented in Triton Language.☆488Updated this week
- A flexible and efficient training framework for large-scale alignment tasks☆342Updated 2 months ago
- The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)☆222Updated 3 months ago