microsoft / only_train_once
OTOv1-v3, NeurIPS, ICLR, TMLR, DNN Training, Compression, Structured Pruning, Erasing Operators, CNN, Diffusion, LLM
☆24Updated last month
Related projects ⓘ
Alternatives and complementary repositories for only_train_once
- [ICML 2022] "Coarsening the Granularity: Towards Structurally Sparse Lottery Tickets" by Tianlong Chen, Xuxi Chen, Xiaolong Ma, Yanzhi Wa…☆31Updated last year
- Are gradient information useful for pruning of LLMs?☆38Updated 6 months ago
- [ECCV 2022] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning☆19Updated 2 years ago
- In this repository, we explore model compression for transformer architectures via quantization. We specifically explore quantization awa…☆22Updated 3 years ago
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆19Updated 8 months ago
- ☆34Updated 8 months ago
- ☆14Updated 11 months ago
- ACL 2023☆38Updated last year
- ☆17Updated last year
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆20Updated this week
- ☆17Updated 3 months ago
- ☆46Updated last month
- An algorithm for static activation quantization of LLMs☆68Updated this week
- ☆20Updated 2 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆25Updated 5 months ago
- GPU operators for sparse tensor operations☆29Updated 8 months ago
- [ICLR 2023] "Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!" Shiwei Liu, Tianlong Chen, Zhenyu Zhang, Xuxi Chen…☆27Updated last year
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆38Updated 9 months ago
- ☆14Updated 9 months ago
- [AAAI 2024] Fluctuation-based Adaptive Structured Pruning for Large Language Models☆37Updated 10 months ago
- ☆46Updated last year
- [TMLR] Official PyTorch implementation of paper "Efficient Quantization-aware Training with Adaptive Coreset Selection"☆29Updated 2 months ago
- Flexible simulator for mixed precision and format simulation of LLMs and vision transformers.☆43Updated last year
- BESA is a differentiable weight pruning technique for large language models.☆14Updated 8 months ago
- AFPQ code implementation☆18Updated last year
- This project is the official implementation of our accepted IEEE TPAMI paper Diverse Sample Generation: Pushing the Limit of Data-free Qu…☆14Updated last year
- Efficient 2:4 sparse training algorithms and implementations☆21Updated 5 months ago
- The official PyTorch implementation of the NeurIPS2022 (spotlight) paper, Outlier Suppression: Pushing the Limit of Low-bit Transformer L…☆46Updated 2 years ago
- Block Sparse movement pruning☆78Updated 3 years ago