Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
☆1,249May 27, 2026Updated this week
Alternatives and similar repositories for AngelSlim
Users that are interested in AngelSlim are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆711Dec 30, 2025Updated 5 months ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 9 months ago
- ☆13Oct 14, 2025Updated 7 months ago
- This repository provides the code for applying Contrastive Learning Penalty Loss (CLPL) and Mixture of Experts (MoE) to the BGE-M3 text e…☆11Dec 27, 2024Updated last year
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆40Jun 4, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆244Sep 30, 2024Updated last year
- [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.☆715May 14, 2026Updated 2 weeks ago
- ☆30May 24, 2025Updated last year
- Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).☆2,357Feb 20, 2026Updated 3 months ago
- [ICML 2025 Spotlight] RAPID: Long-Context Inference with Retrieval-Augmented Speculative Decoding☆23Mar 2, 2025Updated last year
- ☆14Jun 22, 2025Updated 11 months ago
- Tencent Hunyuan 7B (short as Hunyuan-7B) is one of the large language dense models of Tencent Hunyuan☆72Aug 11, 2025Updated 9 months ago
- Pytorch ImageNet training codes with various tricks, lr schedulers, distributed training, mixed precision training, DALI dataloader etc.☆18Aug 12, 2020Updated 5 years ago
- Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge☆121Updated this week
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆155May 10, 2025Updated last year
- Reading notes on Speculative Decoding papers☆34Apr 16, 2026Updated last month
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆1,135Updated this week
- This is the codebase for pre-training, compressing, extending, and distilling LLMs with Megatron-LM.☆12Mar 11, 2024Updated 2 years ago
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- C++ implementation of "Mobile Vision Transformer-based Visual Object Tracking" (BMVC2023) and "Separable Self and Mixed Attention Transf…☆13Apr 23, 2024Updated 2 years ago
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization☆174Nov 26, 2025Updated 6 months ago
- DiTAS: Quantizing Diffusion Transformers via Enhanced Activation Smoothing (WACV 2025)☆13Feb 7, 2026Updated 3 months ago
- [ICLR 2025] "GraphRouter: A Graph-based Router for LLM Selections", Tao Feng, Yanzhen Shen, Jiaxuan You☆71Dec 30, 2025Updated 5 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆160Mar 21, 2025Updated last year
- [ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding☆1,337Mar 6, 2025Updated last year
- Keyformer proposes KV Cache reduction through key tokens identification and without the need for fine-tuning☆57Mar 26, 2024Updated 2 years ago
- ☆66Oct 25, 2025Updated 7 months ago
- ☆172Mar 9, 2023Updated 3 years ago
- ☆106Sep 9, 2024Updated last year
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆264Aug 9, 2025Updated 9 months ago
- [MLSys'25] QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving; [MLSys'25] LServe: Efficient Long-sequence LLM Se…☆840Mar 6, 2025Updated last year
- ☆541Apr 1, 2026Updated last month
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ScrollNet for Continual Learning☆11Sep 11, 2023Updated 2 years ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆47Jun 11, 2025Updated 11 months ago
- ☆43Jan 30, 2024Updated 2 years ago
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.☆138May 16, 2024Updated 2 years ago
- [AAAI 2026] Official implementation of the paper ”SegDINO3D: 3D Instance Segmentation Empowered by Both Image-Level and Object-Level 2D F…☆53Jan 8, 2026Updated 4 months ago
- 3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding☆84Jul 3, 2025Updated 10 months ago
- A Knowledge-grounded framework for Autonomous ML/AI Program Synthesis and Optimization☆90Feb 20, 2026Updated 3 months ago