Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
☆387Jun 2, 2025Updated 9 months ago
Alternatives and similar repositories for sparsezoo
Users that are interested in sparsezoo are comparing it to the libraries listed below
Sorting:
- ML model optimization product to accelerate inference.☆326Jun 2, 2025Updated 9 months ago
- Top-level directory for documentation and general content☆120Jun 2, 2025Updated 9 months ago
- Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models☆2,144Jun 2, 2025Updated 9 months ago
- Sparsity-aware deep learning inference runtime for CPUs☆3,163Jun 2, 2025Updated 9 months ago
- A model compression and acceleration toolbox based on pytorch.☆333Jan 12, 2024Updated 2 years ago
- [TCAD 2021] Block Convolution: Towards Memory-Efficient Inference of Large-Scale CNNs on FPGA☆17Jul 7, 2022Updated 3 years ago
- ☆10Jul 27, 2020Updated 5 years ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Dec 4, 2025Updated 3 months ago
- Awesome Quantization Paper lists with Codes☆10Feb 24, 2021Updated 5 years ago
- Code accompanying the NeurIPS 2020 paper: WoodFisher (Singh & Alistarh, 2020)☆53Mar 8, 2021Updated 4 years ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆280Nov 3, 2023Updated 2 years ago
- McPAT modeling framework☆12Oct 18, 2014Updated 11 years ago
- An external memory allocator example for PyTorch.☆16Aug 10, 2025Updated 6 months ago
- Official implementation of "Searching for Winograd-aware Quantized Networks" (MLSys'20)☆27Oct 3, 2023Updated 2 years ago
- We have implemented a framework that supports developers to structured prune neural networks of Tensorflow Models☆28Nov 7, 2024Updated last year
- PyTorch implementation for the APoT quantization (ICLR 2020)☆283Dec 11, 2024Updated last year
- Benchmark PyTorch Custom Operators☆14Jul 6, 2023Updated 2 years ago
- Refine high-quality datasets and visual AI models☆10,410Feb 28, 2026Updated last week
- QONNX: Arbitrary-Precision Quantized Neural Networks in ONNX☆178Feb 19, 2026Updated 2 weeks ago
- Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".☆872Aug 20, 2024Updated last year
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆101May 30, 2023Updated 2 years ago
- Pytorch distributed backend extension with compression support☆17Mar 24, 2025Updated 11 months ago
- 完成轻量化网络FastestDet的算法NCNN部署☆17Jul 7, 2022Updated 3 years ago
- Generating Training Data Made Easy☆43Jul 3, 2020Updated 5 years ago
- High Performance Int8 GEMM Kernels for SM80 and later GPUs.☆20Mar 11, 2025Updated 11 months ago
- ☆18Sep 25, 2025Updated 5 months ago
- MaxEVA: Maximizing the Efficiency of Matrix Multiplication on Versal AI Engine (accepted as full paper at FPT'23)☆21Apr 17, 2024Updated last year
- PyTorch reimplementation of the paper "HyperMixer: An MLP-based Green AI Alternative to Transformers" [arXiv 2022].☆18Mar 28, 2022Updated 3 years ago
- ☆23Jan 3, 2025Updated last year
- segment-anything based mnn☆36Dec 13, 2023Updated 2 years ago
- [NeurIPS 2023] ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer☆30Dec 6, 2023Updated 2 years ago
- 🚀 Accelerate inference and training of 🤗 Transformers, Diffusers, TIMM and Sentence Transformers with easy to use hardware optimization…☆3,305Feb 9, 2026Updated 3 weeks ago
- Libraries, guides, blueprints, and sample code, to enable rapidly building 0-1 applications on iOS, Android and web.☆11May 12, 2023Updated 2 years ago
- Tools for simple inference testing using TensorRT, CUDA and OpenVINO CPU/GPU and CPU providers. Simple Inference Test for ONNX.☆24Sep 7, 2025Updated 6 months ago
- Your PyTorch AI Factory - Flash enables you to easily configure and run complex AI recipes for over 15 tasks across 7 data domains☆1,731Oct 8, 2023Updated 2 years ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆397Feb 24, 2024Updated 2 years ago
- Code for generating the JuICe dataset.☆37Oct 27, 2021Updated 4 years ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated last year
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Jul 21, 2023Updated 2 years ago