microsoft / only_train_once
OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM
☆28Updated 3 months ago
Alternatives and similar repositories for only_train_once:
Users that are interested in only_train_once are comparing it to the libraries listed below
- [ICML 2022] "Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets" by Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wa…☆31Updated last year
- Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML'24)☆29Updated 5 months ago
- ☆26Updated 2 months ago
- In this repository, we explore model compression for transformer architectures via quantization. We specifically explore quantization awa…☆22Updated 3 years ago
- ☆21Updated 5 months ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆28Updated 7 months ago
- ACL 2023☆38Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆35Updated 10 months ago
- [ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning☆19Updated 2 years ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆22Updated 10 months ago
- ☆20Updated 2 years ago
- [Preprint] Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Prunin…☆40Updated last year
- Are gradient information useful for pruning of LLMs?☆41Updated 8 months ago
- # Unified Normalization (ACM MM'22) By Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, and Shiliang P…☆34Updated last year
- BESA is a differentiable weight pruning technique for large language models.☆14Updated 10 months ago
- IntLLaMA: A fast and light quantization solution for LLaMA☆18Updated last year
- [TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"☆30Updated 4 months ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆60Updated 9 months ago
- ☆11Updated 4 months ago
- [ICCV 2021] Code release for "Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks"☆32Updated 2 years ago
- Position-based Scaled Gradient for Model Quantization and Pruning Code (NeurIPS 2020)☆26Updated 4 years ago
- [ICASSP'22] Integer-only Zero-shot Quantization for Efficient Speech Recognition☆31Updated 3 years ago
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models☆40Updated last year
- LLM Inference with Microscaling Format☆16Updated 2 months ago
- MLPruning, PyTorch, NLP, BERT, Structured Pruning☆21Updated 3 years ago
- Source code for IJCAI 2022 Long paper: Parameter-Efficient Sparsity for Large Language Models Fine-Tuning.☆13Updated 2 years ago
- ☆22Updated last year
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆45Updated last year
- GPU operators for sparse tensor operations☆30Updated 10 months ago