zhihu / ZhiLightLinks
A highly optimized LLM inference acceleration engine for Llama and its variants.
☆898Updated last month
Alternatives and similar repositories for ZhiLight
Users that are interested in ZhiLight are comparing it to the libraries listed below
Sorting:
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆224Updated 8 months ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆791Updated 2 weeks ago
- adds Sequence Parallelism into LLaMA-Factory☆511Updated this week
- Unified KV Cache Compression Methods for Auto-Regressive Models☆1,138Updated 5 months ago
- Higher performance OpenAI LLM service than vLLM serve: A pure C++ high-performance OpenAI LLM service implemented with GPRS+TensorRT-LLM+…☆140Updated last month
- TVM Documentation in Chinese Simplified / TVM 中文文档☆1,696Updated 2 months ago
- Train your Agent model via our easy and efficient framework☆1,144Updated this week
- FlagPerf is an open-source software platform for benchmarking AI chips.☆338Updated this week
- ☆334Updated 5 months ago
- xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism☆2,028Updated this week
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆256Updated 3 weeks ago
- minimal-cost for training 0.5B R1-Zero☆742Updated last month
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆90Updated 7 months ago
- FlagScale is a large model toolkit based on open-sourced projects.☆301Updated this week
- GLake: optimizing GPU memory management and IO transmission.☆467Updated 2 months ago
- Deep Learning Deployment Framework: Supports tf/torch/trt/trtllm/vllm and other NN frameworks. Support dynamic batching, and streaming mo…☆164Updated last month
- Materials for learning SGLang☆443Updated this week
- A fast communication-overlapping library for tensor/expert parallelism on GPUs.☆978Updated 3 weeks ago
- A self-learning tutorail for CUDA High Performance Programing.☆641Updated 2 months ago
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆242Updated last year
- 使用deepspeed从头开始训练一个LLM,经过pretrain和sft阶段,验证llm学习知识、理解语言、回答问题的能力☆154Updated 11 months ago
- ☆67Updated 7 months ago
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆474Updated last year
- ☆70Updated 6 months ago
- how to learn PyTorch and OneFlow☆434Updated last year
- LLM notes, including model inference, transformer model structure, and llm framework code analysis notes.☆780Updated this week
- Disaggregated serving system for Large Language Models (LLMs).☆617Updated 2 months ago
- FlagGems is an operator library for large language models implemented in the Triton Language.☆573Updated this week
- The repository has collected a batch of noteworthy MLSys bloggers (Algorithms/Systems)☆241Updated 5 months ago
- USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference☆519Updated 3 weeks ago