huggingface / nn_pruningLinks

Prune a model while finetuning or training.

☆403

Alternatives and similar repositories for nn_pruning

Users that are interested in nn_pruning are comparing it to the libraries listed below

Sorting:

tunib-ai / parallelformers
Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
☆790Updated 2 years ago
facebookresearch / bitsandbytes
Library for 8-bit optimizers and quantization routines.
☆772Updated 2 years ago
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆273Updated 3 years ago
IntelLabs / Model-Compression-Research-Package
A library for researching neural networks compression and acceleration methods.
☆138Updated 11 months ago
IntelLabs / academic-budget-bert
Repository containing code for "How to Train BERT with an Academic Budget" paper
☆314Updated last year
huggingface / pytorch_block_sparse
Fast Block Sparse Matrices for Pytorch
☆548Updated 4 years ago
microsoft / fastformers
FastFormers - highly efficient transformer models for NLU
☆705Updated 4 months ago
princeton-nlp / CoFiPruning
[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
☆196Updated 2 years ago
microsoft / fastseq
An efficient implementation of the popular sequence models for text generation, summarization, and translation tasks. https://arxiv.org/p…
☆433Updated 2 years ago
mit-han-lab / hardware-aware-transformers
[ACL'20] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
☆336Updated last year
bzhangGo / rmsnorm
Root Mean Square Layer Normalization
☆249Updated 2 years ago
facebookresearch / mega
Sequence modeling with Mega.
☆297Updated 2 years ago
pytorch / ort
Accelerate PyTorch models with ONNX Runtime
☆364Updated 5 months ago
WoosukKwon / retraining-free-pruning
[NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformers
☆191Updated 2 years ago
Ki6an / fastT5
⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
☆586Updated 2 years ago
kssteven418 / I-BERT
[ICML'21 Oral] I-BERT: Integer-only BERT Quantization
☆255Updated 2 years ago
bytedance / effective_transformer
Running BERT without Padding
☆472Updated 3 years ago
Lightning-Universe / lightning-transformers
Flexible components pairing 🤗 Transformers with Pytorch Lightning
☆610Updated 2 years ago
LiyuanLucasLiu / Transformer-Clinic
Understanding the Difficulty of Training Transformers
☆329Updated 3 years ago
huggingface / block_movement_pruning
Block Sparse movement pruning
☆81Updated 4 years ago
ofirpress / attention_with_linear_biases
Code for the ALiBi method for transformer language models (ICLR 2022)
☆538Updated last year
microsoft / infinibatch
Efficient, check-pointed data loading for deep learning with massive data sets.
☆208Updated 2 years ago
microsoft / xtreme-distil-transformers
XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale
☆155Updated last year
lucidrains / electra-pytorch
A simple and working implementation of Electra, the fastest way to pretrain language models from scratch, in Pytorch
☆227Updated 2 years ago
microsoft / varuna
☆251Updated last year
google-research / long-range-arena
Long Range Arena for Benchmarking Efficient Transformers
☆762Updated last year
facebookresearch / diffq
DiffQ performs differentiable quantization using pseudo quantization noise. It can automatically tune the number of bits used per weight …
☆236Updated 2 years ago
microsoft / Tutel
Tutel MoE: Optimized Mixture-of-Experts Library, Support DeepSeek/Kimi-K2/Qwen3 FP8/FP4
☆870Updated last week
luyug / GradCache
Run Effective Large Batch Contrastive Learning Beyond GPU/TPU Memory Constraint
☆399Updated last year
kaiyuyue / torchshard
Slicing a PyTorch Tensor Into Parallel Shards
☆299Updated 2 months ago