Code repository for ICLR 2025 paper "LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid"
☆28Mar 2, 2025Updated last year
Alternatives and similar repositories for LeanQuant
Users that are interested in LeanQuant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆15Apr 6, 2026Updated 2 months ago
- [IJCAI 2023] CLE-ViT: Contrastive Learning Encoded Transformer for Ultra-Fine-Grained Visual Categorization.☆10Nov 3, 2023Updated 2 years ago
- [ACM MM2025]: MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization☆44Aug 13, 2025Updated 9 months ago
- A tool which checks compatibility of CoreML model with Apple Neural Engine☆14May 30, 2022Updated 4 years ago
- HALO: Hadamard-Assisted Low-Precision Optimization and Training method for finetuning LLMs. 🚀 The official implementation of https://arx…☆29Feb 17, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆11Dec 30, 2024Updated last year
- ☆14May 21, 2024Updated 2 years ago
- A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs☆15Dec 17, 2024Updated last year
- ☆21Feb 5, 2024Updated 2 years ago
- flex-block-attn: an efficient block sparse attention computation library☆131Dec 26, 2025Updated 5 months ago
- Training project about Deep Learing☆12Jun 22, 2017Updated 8 years ago
- An Alfred workflow to toggle Yosemite's dark and light modes.☆14Oct 6, 2018Updated 7 years ago
- [ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…☆93Apr 8, 2025Updated last year
- ☆27May 12, 2026Updated 3 weeks ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Code for paper "Conversational Product Search Based on Negative Feedback"☆12Jun 26, 2020Updated 5 years ago
- [ACL2025 Oral🔥]Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling☆29Nov 11, 2025Updated 6 months ago
- egraph <-> json☆17Dec 29, 2025Updated 5 months ago
- Code and dataset for the EMNLP 2024 paper: GoldCoin: Grounding Large Language Models in Privacy Laws via Contextual Integrity Theory☆51Sep 26, 2024Updated last year
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.☆138May 16, 2024Updated 2 years ago
- ☆14Jan 10, 2024Updated 2 years ago
- [ICML 2025] Official PyTorch implementation of "FlatQuant: Flatness Matters for LLM Quantization"☆217Nov 25, 2025Updated 6 months ago
- Official implementation of the EMNLP23 paper: Outlier Suppression+: Accurate quantization of large language models by equivalent and opti…☆51Oct 21, 2023Updated 2 years ago
- ☆13Aug 31, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- This repository contains low-bit quantization papers from 2020 to 2025 on top conference.☆169Apr 29, 2026Updated last month
- A compiler of Decaf(an object-oriented compiler)☆12Sep 26, 2017Updated 8 years ago
- MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models☆28Apr 2, 2026Updated 2 months ago
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs☆15Jul 18, 2024Updated last year
- MLIR+EqSat☆27Jan 10, 2026Updated 4 months ago
- ☆24Mar 6, 2023Updated 3 years ago
- SIGIR'20: An Analysis of BERT in Document Ranking☆21Jul 27, 2020Updated 5 years ago
- QRHead: Query-Focused Retrieval Heads Improve Long-Context Reasoning and Re-ranking☆39Jan 20, 2026Updated 4 months ago
- The official implementation of "Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers" (arXiv …☆51Jun 6, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A collection of tricks and tools to speed up transformer models☆205May 6, 2026Updated last month
- Robust Speech Recognition via Large-Scale Weak Supervision☆28Feb 3, 2025Updated last year
- ICCV 2019 Tutorial: Global Optimization for Geometric Understanding with Provable Guarantees☆15Oct 20, 2022Updated 3 years ago
- Open-source AI acceleration on FPGA: from ONNX to RTL☆54Updated this week
- ☆29Feb 2, 2023Updated 3 years ago
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)☆92Jul 28, 2025Updated 10 months ago
- ☆18Jul 11, 2021Updated 4 years ago