Tencent/AngelSlim

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Tencent/AngelSlim)

Tencent / AngelSlim

Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.

☆1,491

Alternatives and similar repositories for AngelSlim

Users that are interested in AngelSlim are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

sgl-project / SpecForge
View on GitHub
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
☆1,027Updated this week
cornradio / AirMouse
View on GitHub
用手机当鼠标/键盘的极简解决方案
☆268May 5, 2026Updated 2 months ago
SafeAILab / EAGLE
View on GitHub
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
☆2,481Feb 20, 2026Updated 5 months ago
Tencent / hpc-ops
View on GitHub
High Performance LLM Inference Operator Library
☆1,070Updated this week
melody0709 / zencrop
View on GitHub
An independent reimplementation of PowerToys Crop And Lock. Always On Top
☆316May 12, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
z-lab / dflash
View on GitHub
DFlash: Block Diffusion for Flash Speculative Decoding
☆5,547May 10, 2026Updated 2 months ago
vllm-project / llm-compressor
View on GitHub
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
☆3,602Updated this week
vllm-project / speculators
View on GitHub
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆669Updated this week
intel / auto-round
View on GitHub
A SOTA quantization algorithm for high-accuracy low-bit LLM inference, seamlessly optimized for CPU/XPU/CUDA, with multi-datatype support…
☆1,543Updated this week
NVIDIA / Model-Optimizer
View on GitHub
A unified library of SOTA model optimization techniques like quantization, distillation, pruning, neural architecture search, speculative…
☆3,341Updated this week
shenmin / cassotis-ime
View on GitHub
Cassotis IME - a native Delphi/TSF Chinese Pinyin IME for Windows 10/11. 言泉输入法 —— 基于 Delphi/TSF 的 Windows 10/11 原生开源中文拼音输入法，支持全拼、简拼、六种双拼，…
☆232Updated this week
flashinfer-ai / flashinfer
View on GitHub
FlashInfer: Kernel Library for LLM Serving
☆6,062Updated this week
ZHITENGLI / ARB-LLM
View on GitHub
[ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Models
☆31Aug 5, 2025Updated 11 months ago
sgl-project / sglang
View on GitHub
SGLang is a high-performance serving framework for large language models and multimodal models.
☆30,926Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
LiaoGuoYin / lixian.online
View on GitHub
A one-stop tool to grab installer packages for VSCode extensions, Chrome/Edge add-ons, Docker images, and Microsoft Store apps — download…
☆559Updated this week
deepseek-ai / DeepSpec
View on GitHub
DeepSpec: a full-stack codebase for training and evaluating speculative decoding algorithms
☆6,810Jul 9, 2026Updated 3 weeks ago
Tencent-Hunyuan / Hunyuan-MT
View on GitHub
☆713Dec 30, 2025Updated 6 months ago
ModelCloud / GPTQModel
View on GitHub
LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM…
☆1,217Updated this week
ModelTC / LightCompress
View on GitHub
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.
☆736May 14, 2026Updated 2 months ago
LanRhyme / MicYou
View on GitHub
MicYou is a powerful tool that turns your Android device into a high-quality microphone for your PC.
☆3,131Updated this week
THUDM / slime
View on GitHub
slime is an LLM post-training framework for RL Scaling.
☆7,696Updated this week
tile-ai / tilelang
View on GitHub
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
☆7,007Updated this week
thu-ml / SageAttention
View on GitHub
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-t…
☆3,518Jan 17, 2026Updated 6 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
nunchaku-ai / deepcompressor
View on GitHub
Model Compression Toolbox for Large Language Models and Diffusion Models
☆796Aug 14, 2025Updated 11 months ago
bytedance / ABQ-LLM
View on GitHub
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
☆246Sep 30, 2024Updated last year
humanfirework / FlowWheel
View on GitHub
Global Smooth Auto-Scrolling for Windows / Windows 全局平滑自动滚屏工具（支持多屏协同滚动）
☆214Updated this week
ChenMnZ / PrefixQuant
View on GitHub
An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization
☆176Nov 26, 2025Updated 8 months ago
thu-ml / SpargeAttn
View on GitHub
[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.
☆1,019Feb 25, 2026Updated 5 months ago
jianuo-huang / Domino
View on GitHub
Official implementation of “Domino: Decoupling Causal Modeling from Autoregressive Drafting in Speculative Decoding”.
☆124Updated this week
GeeeekExplorer / nano-vllm
View on GitHub
Nano vLLM
☆14,679Apr 26, 2026Updated 3 months ago
kvcache-ai / Mooncake
View on GitHub
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
☆6,083Updated this week
lightseekorg / tokenspeed
View on GitHub
TokenSpeed is a speed-of-light LLM inference engine.
☆1,751Updated this week
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
tile-ai / TileRT
View on GitHub
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
☆1,600Jul 14, 2026Updated 2 weeks ago
Tencent-Hunyuan / Hy-MT
View on GitHub
☆797Jun 1, 2026Updated last month
OpenBMB / MiniCPM
View on GitHub
MiniCPM5-1B: A SOTA 1B on-device LLM, small yet powerful.
☆10,064Updated this week
mit-han-lab / omniserve
View on GitHub
[MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…
☆852Mar 6, 2025Updated last year
liranringel / ddtree
View on GitHub
☆390Apr 16, 2026Updated 3 months ago
ModelTC / LightLLM
View on GitHub
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…
☆4,198Updated this week
vipshop / cache-dit
View on GitHub
A PyTorch-native inference engine with cache, parallelism, quantization and cpu offload for DiTs.
☆1,239Updated this week