Official Code For Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM
☆14Dec 27, 2023Updated 2 years ago
Alternatives and similar repositories for DGQ
Users that are interested in DGQ are comparing it to the libraries listed below
Sorting:
- ☆13Jun 22, 2025Updated 8 months ago
- [ICLR 2024 Spotlight] This is the official PyTorch implementation of "EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Di…☆68Jun 4, 2024Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated last year
- ☆14Nov 20, 2022Updated 3 years ago
- ☆19Jan 3, 2025Updated last year
- ☆19Nov 6, 2023Updated 2 years ago
- ☆21Feb 11, 2022Updated 4 years ago
- torch_quantizer is a out-of-box quantization tool for PyTorch models on CUDA backend, specially optimized for Diffusion Models.☆23Mar 29, 2024Updated last year
- The official implementation of PTQD: Accurate Post-Training Quantization for Diffusion Models☆103Mar 12, 2024Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆31Mar 12, 2024Updated last year
- Repository for CPU Kernel Generation for LLM Inference☆28Jul 13, 2023Updated 2 years ago
- ☆32Nov 11, 2024Updated last year
- [ICML2025] LoRA fine-tune directly on the quantized models.☆39Nov 25, 2024Updated last year
- [NeurIPS 2024] The official implementation of ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification☆31Mar 30, 2025Updated 11 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Mar 7, 2025Updated 11 months ago
- [NeurIPS 2023] Repetition In Repetition Out: Towards Understanding Neural Text Degeneration from the Data Perspective☆40Oct 17, 2023Updated 2 years ago
- Refactoring official website backend of ISDC(Sichuan University Information Security and Network Attack and Defense Community)☆10Mar 12, 2018Updated 7 years ago
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆336Jul 2, 2024Updated last year
- [ICCV2025] The official code of "DreamRelation: Relation-Centric Video Customization"☆27Feb 4, 2026Updated last month
- rabitq rust implementation☆10Feb 4, 2026Updated last month
- TSDG: An efficient index graph for graph-based nearest neighbor search☆10Jul 14, 2022Updated 3 years ago
- Tiny optimized Stable-diffusion that can run on GPUs with just 1GB of VRAM. (Beta)☆182Jul 20, 2023Updated 2 years ago
- [ICML 2025] SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models☆53Aug 9, 2024Updated last year
- [CVPR 2024 Highlight & TPAMI 2025] This is the official PyTorch implementation of "TFMQ-DM: Temporal Feature Maintenance Quantization for…☆108Sep 29, 2025Updated 5 months ago
- mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models☆11Jan 19, 2024Updated 2 years ago
- The official code for "Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation" | [MM2…☆14Dec 7, 2024Updated last year
- Seminar: intro to deep learning with tensorflow☆13Jun 27, 2017Updated 8 years ago
- Conditional DDPM for characterizing radio sources from dirty images. (autumn 2023)☆11Nov 30, 2023Updated 2 years ago
- ☆12Jul 28, 2022Updated 3 years ago
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Jan 26, 2021Updated 5 years ago
- A PyTorch implementation of Proxy Anchor Loss based on CVPR 2020 paper "Proxy Anchor Loss for Deep Metric Learning"☆11Jan 16, 2021Updated 5 years ago
- simplify the prediction process for a finetuned bert model☆11Jun 19, 2019Updated 6 years ago
- Official implementation of ICML'24 paper "LQER: Low-Rank Quantization Error Reconstruction for LLMs"☆19Jul 11, 2024Updated last year
- ☆11Apr 3, 2023Updated 2 years ago
- ☆10Feb 12, 2024Updated 2 years ago
- Official code for "Algorithmic Capabilities of Random Transformers" (NeurIPS 2024)☆16Sep 28, 2024Updated last year
- ☆14Jan 5, 2022Updated 4 years ago
- Face detection using Multi-scale Block Local Binary Pattern algorithm - optimized with OpenCL/OpenMP - Depreciated - pls use convolutiona…☆11Jul 16, 2017Updated 8 years ago
- 工业级中文语音识别系统电子书☆13Oct 30, 2020Updated 5 years ago