neuralmagic / sparsezoo
Neural network model repository for highly sparse and sparse-quantized models with matching sparsification recipes
☆382Updated 8 months ago
Alternatives and similar repositories for sparsezoo:
Users that are interested in sparsezoo are comparing it to the libraries listed below
- ML model optimization product to accelerate inference.☆326Updated 11 months ago
- Top-level directory for documentation and general content☆121Updated 3 months ago
- Libraries for applying sparsification recipes to neural networks with a few lines of code, enabling faster and smaller models☆2,118Updated 7 months ago
- Sparsity-aware deep learning inference runtime for CPUs☆3,117Updated 8 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆262Updated 5 months ago
- A Python-level JIT compiler designed to make unmodified PyTorch programs faster.☆1,037Updated 11 months ago
- An open-source efficient deep learning framework/compiler, written in python.☆691Updated 3 weeks ago
- Prune a model while finetuning or training.☆400Updated 2 years ago
- Recipes are a standard, well supported set of blueprints for machine learning engineers to rapidly train models using the latest research…☆310Updated this week
- Library for 8-bit optimizers and quantization routines.☆717Updated 2 years ago
- A research library for pytorch-based neural network pruning, compression, and more.☆160Updated 2 years ago
- Fast sparse deep learning on CPUs☆52Updated 2 years ago
- ONNX Script enables developers to naturally author ONNX functions and models using a subset of Python.☆323Updated this week
- Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…☆1,559Updated last year
- DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight …☆235Updated last year
- A library for researching neural networks compression and acceleration methods.☆141Updated 6 months ago
- Implementation of a Transformer, but completely in Triton☆260Updated 2 years ago
- Code for the ICML 2023 paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot".☆775Updated 7 months ago
- SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX R…☆2,355Updated this week
- Accelerate PyTorch models with ONNX Runtime☆358Updated 3 weeks ago
- TF2 implementation of knowledge distillation using the "function matching" hypothesis from https://arxiv.org/abs/2106.05237.☆87Updated 3 years ago
- ☆141Updated 2 years ago
- Scailable ONNX python tools☆97Updated 4 months ago
- This repository contains the experimental PyTorch native float8 training UX☆222Updated 7 months ago
- Reference implementations of popular Binarized Neural Networks☆107Updated last week
- PyTorch library to facilitate development and standardized evaluation of neural network pruning methods.☆428Updated last year
- ONNX Optimizer☆681Updated this week
- Accelerate your Neural Architecture Search (NAS) through fast, reproducible and modular research.☆474Updated 4 months ago
- A repository for log-time feedforward networks☆220Updated 11 months ago
- ☆202Updated 2 years ago