Efficient-ML / Awesome-Efficient-LLM-Diffusion
A list of papers, docs, codes about efficient AIGC. This repo is aimed to provide the info for efficient AIGC research, including language and vision, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.
β166Updated 2 months ago
Alternatives and similar repositories for Awesome-Efficient-LLM-Diffusion:
Users that are interested in Awesome-Efficient-LLM-Diffusion are comparing it to the libraries listed below
- [NeurIPS 2024 Oralπ₯] DuQuant: Distributing Outliers via Dual Transformation Makes Stronger Quantized LLMs.β138Updated 3 months ago
- Awesome list for LLM pruning.β195Updated last month
- Code Repository of Evaluating Quantized Large Language Modelsβ114Updated 4 months ago
- [ICLR'25] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generationβ41Updated this week
- [CVPR 2024 Highlight] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Moβ¦β60Updated 5 months ago
- Implementation of Post-training Quantization on Diffusion Models (CVPR 2023)β126Updated last year
- PyTorch implementation of PTQ4DiT https://arxiv.org/abs/2405.16005β19Updated 2 months ago
- The official PyTorch implementation of the ICLR2022 paper, QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quanβ¦β115Updated last year
- [ICML 2023] This project is the official implementation of our accepted ICML 2023 paper BiBench: Benchmarking and Analyzing Network Binarβ¦β54Updated 10 months ago
- This repository contains integer operators on GPUs for PyTorch.β190Updated last year
- Awesome list for LLM quantizationβ160Updated last month
- Awesome LLM pruning papers all-in-one repository with integrating all useful resources and insights.β61Updated last month
- Code for the NeurIPS 2022 paper "Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning".β109Updated last year
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diβ¦β55Updated 7 months ago
- Post-Training Quantization for Vision transformers.β201Updated 2 years ago
- Awasome Papers and Resources in Deep Neural Network Pruning with Source Code.β142Updated 5 months ago
- QuEST: Efficient Finetuning for Low-bit Diffusion Modelsβ36Updated last week
- The official implementation of PTQD: Accurate Post-Training Quantization for Diffusion Modelsβ93Updated 10 months ago
- [CVPR 2023] PD-Quant: Post-Training Quantization Based on Prediction Difference Metricβ52Updated last year
- List of papers related to neural network quantization in recent AI conferences and journals.β514Updated last month
- β210Updated 5 months ago
- An algorithm for static activation quantization of LLMsβ112Updated 2 weeks ago
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retentiβ¦β60Updated 9 months ago
- List of papers related to Vision Transformers quantization and hardware acceleration in recent AI conferences and journals.β72Updated 7 months ago
- The official implementation of the NeurIPS 2022 paper Q-ViT.β86Updated last year
- Official PyTorch implementation of FlatQuant: Flatness Matters for LLM Quantizationβ95Updated last week
- [NeurIPS 2022] A Fast Post-Training Pruning Framework for Transformersβ181Updated last year
- Code repo for the paper "LLM-QAT Data-Free Quantization Aware Training for Large Language Models"β266Updated 4 months ago
- This project is the official implementation of our accepted ICLR 2022 paper BiBERT: Accurate Fully Binarized BERT.β85Updated last year
- [IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformerβ320Updated last year