Code repository for ICLR 2025 paper "LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid"
☆25Mar 2, 2025Updated last year
Alternatives and similar repositories for LeanQuant
Users that are interested in LeanQuant are comparing it to the libraries listed below
Sorting:
- ☆15Jan 12, 2026Updated 2 months ago
- [ACM MM2025]: MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization☆38Aug 13, 2025Updated 7 months ago
- ☆15Dec 4, 2024Updated last year
- ☆79Mar 3, 2026Updated 2 weeks ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated last year
- A tool which checks compatibility of CoreML model with Apple Neural Engine☆13May 30, 2022Updated 3 years ago
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- ☆14May 21, 2024Updated last year
- A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs☆12Dec 17, 2024Updated last year
- ☆11Feb 11, 2025Updated last year
- An Alfred workflow to toggle Yosemite's dark and light modes.☆14Oct 6, 2018Updated 7 years ago
- ☆53Mar 3, 2026Updated 2 weeks ago
- [ACL2025 Oral🔥]Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling☆24Nov 11, 2025Updated 4 months ago
- ☆27Nov 25, 2025Updated 3 months ago
- Code and dataset for the EMNLP 2024 paper: GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory☆48Sep 26, 2024Updated last year
- Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention☆51Oct 16, 2025Updated 5 months ago
- Code for paper "Conversational Product Search Based on Negative Feedback"☆12Jun 26, 2020Updated 5 years ago
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.☆134May 16, 2024Updated last year
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"☆211Nov 25, 2025Updated 3 months ago
- Created an inverted index in Python for document retreival☆13Oct 7, 2018Updated 7 years ago
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆51Oct 21, 2023Updated 2 years ago
- End2End Virtual Try-on with Visual Reference, CVPR2026☆58Nov 19, 2025Updated 4 months ago
- A dedicated effort to make an optimized, bleeding edge vLLM image using Docker to support DGX comprehensively☆58Feb 22, 2026Updated 3 weeks ago
- ☆12Aug 31, 2023Updated 2 years ago
- Simple intermediate representation language for learning and research.☆20Mar 27, 2020Updated 5 years ago
- Perceptron-based branch predictor written in C++☆13Dec 14, 2016Updated 9 years ago
- A compiler of Decaf(an object-oriented compiler)☆12Sep 26, 2017Updated 8 years ago
- Benchmark datasets for sentiment analysis☆12May 18, 2020Updated 5 years ago
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs☆15Jul 18, 2024Updated last year
- A collection of tricks and tools to speed up transformer models☆196Feb 23, 2026Updated 3 weeks ago
- ☆24Mar 6, 2023Updated 3 years ago
- MLIR+EqSat☆26Jan 10, 2026Updated 2 months ago
- SIGIR'20: An Analysis of BERT in Document Ranking☆21Jul 27, 2020Updated 5 years ago
- The official implementation of "Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers" (arXiv …☆51Jun 6, 2025Updated 9 months ago
- QRHead: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking☆38Jan 20, 2026Updated 2 months ago
- ☆23Oct 7, 2025Updated 5 months ago
- ICCV 2019 Tutorial: Global Optimization for Geometric Understanding with Provable Guarantees☆15Oct 20, 2022Updated 3 years ago
- A web application with a backend in Flask and frontend in React, and React flow node base environment to stream both Gradio ( and later S…☆16Oct 28, 2022Updated 3 years ago
- Open-source AI acceleration on FPGA: from ONNX to RTL☆49Mar 11, 2026Updated last week