microsoft / only_train_once
OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM
☆25Updated last month
Related projects ⓘ
Alternatives and complementary repositories for only_train_once
- In this repository, we explore model compression for transformer architectures via quantization. We specifically explore quantization awa…☆22Updated 3 years ago
- [ICML 2022] "Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets" by Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wa…☆31Updated last year
- Are gradient information useful for pruning of LLMs?☆38Updated 7 months ago
- ☆47Updated 2 months ago
- Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML'24)☆27Updated 3 months ago
- Source code for IJCAI 2022 Long paper: Parameter-Efficient Sparsity for Large Language Models Fine-Tuning.☆13Updated 2 years ago
- An algorithm for static activation quantization of LLMs☆79Updated 2 weeks ago
- [ICASSP'22] Integer-only Zero-shot Quantization for Efficient Speech Recognition☆30Updated 3 years ago
- # Unified Normalization (ACM MM'22) By Qiming Yang, Kai Zhang, Chaoxiang Lan, Zhi Yang, Zheyang Li, Wenming Tan, Jun Xiao, and Shiliang P…☆34Updated last year
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆34Updated 8 months ago
- ☆22Updated 11 months ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆21Updated last month
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆20Updated 8 months ago
- [TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"☆29Updated 3 months ago
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆43Updated last year
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models☆37Updated 10 months ago
- Block Sparse movement pruning☆78Updated 3 years ago
- Efficient 2:4 sparse training algorithms and implementations☆21Updated 5 months ago
- Implementation of IceFormer: Accelerated Inference with Long-Sequence Transformers on CPUs (ICLR 2024).☆20Updated 5 months ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆19Updated 8 months ago
- ☆18Updated 4 months ago
- Official implementation of the ICLR 2024 paper AffineQuant☆22Updated 7 months ago
- ☆24Updated 7 months ago
- ☆35Updated 9 months ago
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…☆46Updated 2 years ago
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- ☆23Updated 4 months ago
- Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding (EMNLP 2023 Long)☆53Updated last month