Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
☆493Mar 3, 2026Updated this week
Alternatives and similar repositories for AngelSlim
Users that are interested in AngelSlim are comparing it to the libraries listed below
Sorting:
- ☆27Updated this week
- PyTorch implementation of "Deep Transferring Quantization" (ECCV2020)☆18Jun 22, 2022Updated 3 years ago
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆21Aug 3, 2025Updated 7 months ago
- ☆28Aug 13, 2025Updated 6 months ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆150May 10, 2025Updated 10 months ago
- Multiplex Thinking: Reasoning via Token-wise Branch-and-Merge☆111Jan 30, 2026Updated last month
- PaddleAPEX:Paddle Accuracy and Performance EXpansion pack☆9Dec 12, 2024Updated last year
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation☆151Mar 21, 2025Updated 11 months ago
- ☆11Dec 11, 2024Updated last year
- ☆13Jun 22, 2025Updated 8 months ago
- C++ implementation of "Mobile Vision Transformer-based Visual Object Tracking" (BMVC2023) and "Separable Self and Mixed Attention Transf…☆12Apr 23, 2024Updated last year
- This repository contains integer operators on GPUs for PyTorch.☆237Sep 29, 2023Updated 2 years ago
- [EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLMs, VLMs, and video generative models.☆683Nov 19, 2025Updated 3 months ago
- The official PyTorch implementation for Improving Long-Text Alignment for Text-to-Image Diffusion Models (LongAlign)☆80Apr 23, 2025Updated 10 months ago
- collab-dev - Collaboration Metrics for Code Reviews☆23May 12, 2025Updated 9 months ago
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…☆49Oct 5, 2022Updated 3 years ago
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆241Sep 30, 2024Updated last year
- Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).☆2,202Feb 20, 2026Updated 2 weeks ago
- Resnet-50 + FPN + Keypoint RCNN☆14Jun 18, 2019Updated 6 years ago
- some object detection algo☆14Jul 25, 2024Updated last year
- Official implementation of UnifiedReward & UnifiedReward-Think☆18Jun 18, 2025Updated 8 months ago
- ☆11Dec 26, 2025Updated 2 months ago
- ScrollNet for Continual Learning☆11Sep 11, 2023Updated 2 years ago
- ☆19Oct 22, 2025Updated 4 months ago
- ☆169Mar 9, 2023Updated 3 years ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆261Aug 9, 2025Updated 7 months ago
- Our 2nd-gen LMM☆34May 22, 2024Updated last year
- ☆12Dec 16, 2021Updated 4 years ago
- This is a fork of SGLang for hip-attention integration. Please refer to hip-attention for detail.☆18Dec 23, 2025Updated 2 months ago
- ☆16Nov 24, 2025Updated 3 months ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- ☆13Oct 14, 2025Updated 4 months ago
- Tencent Hunyuan 7B (short as Hunyuan-7B) is one of the large language dense models of Tencent Hunyuan☆71Aug 11, 2025Updated 6 months ago
- [NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization☆405Aug 13, 2024Updated last year
- ☆104Sep 9, 2024Updated last year
- Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];☆40Jan 4, 2024Updated 2 years ago
- The code for Joint Neural Architecture Search and Quantization☆14Apr 10, 2019Updated 6 years ago
- An external memory allocator example for PyTorch.☆16Aug 10, 2025Updated 6 months ago
- This repository is developed on the basis of pyqt5, mainly through the log files generated during the running of Darknet, and then draw l…☆15Jan 11, 2019Updated 7 years ago