zhihu / ZhiLightLinks
A highly optimized LLM inference acceleration engine for Llama and its variants.
☆890Updated 2 weeks ago
Alternatives and similar repositories for ZhiLight
Users that are interested in ZhiLight are comparing it to the libraries listed below
Sorting:
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆222Updated 8 months ago
- TVM Documentation in Chinese Simplified / TVM 中文文档☆1,426Updated last month
- Unified KV Cache Compression Methods for Auto-Regressive Models☆1,084Updated 4 months ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆777Updated 2 weeks ago
- adds Sequence Parallelism into LLaMA-Factory☆498Updated this week
- Train your Agent model via our easy and efficient framework☆776Updated this week
- ☆332Updated 4 months ago
- minimal-cost for training 0.5B R1-Zero☆727Updated 2 weeks ago
- Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+…☆136Updated 2 weeks ago
- FlagPerf is an open-source software platform for benchmarking AI chips.☆332Updated this week
- FlagScale is a large model toolkit based on open-sourced projects.☆280Updated this week
- GLake: optimizing GPU memory management and IO transmission.☆463Updated 2 months ago
- xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism☆1,961Updated this week
- ☆67Updated 7 months ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆253Updated this week
- Community maintained hardware plugin for vLLM on Ascend☆677Updated this week
- Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming mo…☆159Updated 3 weeks ago
- A self-learning tutorail for CUDA High Performance Programing.☆635Updated last month
- DLRover: An Automatic Distributed Deep Learning System☆1,474Updated this week
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆89Updated 7 months ago
- The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)☆237Updated 4 months ago
- ☆49Updated this week
- Materials for learning SGLang☆424Updated last week
- DeepRetrieval - 🔥 Training Search Agent with Retrieval Outcomes via Reinforcement Learning☆497Updated last week
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆476Updated this week
- 使用deepspeed从头开始训练一个LLM,经过pretrain和sft阶段,验证llm学习知识、理解语言、回答问题的能力☆154Updated 10 months ago
- 一种任务级GPU算力分时调度的高性能深度学习训练平台☆647Updated last year
- Disaggregated serving system for Large Language Models (LLMs).☆601Updated last month
- LLM Inference benchmark☆419Updated 10 months ago
- ☆70Updated 6 months ago