Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)
☆51Jul 6, 2025Updated 9 months ago
Alternatives and similar repositories for GuidedQuant
Users that are interested in GuidedQuant are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆39Sep 24, 2024Updated last year
- ☆21Feb 5, 2024Updated 2 years ago
- ☆42Mar 28, 2024Updated 2 years ago
- Demo for Qwen2.5-VL-3B-Instruct on Axera device.☆15Sep 3, 2025Updated 7 months ago
- ☆25Oct 31, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆87Jan 23, 2025Updated last year
- Triton Implementation of Flash Attention with Bias.☆22Apr 16, 2025Updated 11 months ago
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization☆170Nov 26, 2025Updated 4 months ago
- PyTorch implements `Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning` paper.☆14Aug 19, 2022Updated 3 years ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆34Mar 7, 2025Updated last year
- ☆74Sep 19, 2025Updated 6 months ago
- ☆22May 5, 2025Updated 11 months ago
- ☆15Dec 4, 2024Updated last year
- [SIGGRAPH Asia 2025] CHARM: Control-point-based 3D Anime Hairstyle Auto-Regressive Modeling☆47Sep 26, 2025Updated 6 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆82Mar 3, 2026Updated last month
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…☆69Mar 7, 2024Updated 2 years ago
- Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"☆30Jun 30, 2025Updated 9 months ago
- A forward proxy to turn network traffic into personal memory for AI agents☆37Mar 30, 2026Updated 2 weeks ago
- A tool which checks compatibility of CoreML model with Apple Neural Engine☆14May 30, 2022Updated 3 years ago
- Artifact evaluation for HPCA'24 paper Lightening-Transformer: A Dynamically-operated Optically-interconnected Photonic Transformer Accele…☆11Mar 3, 2024Updated 2 years ago
- Cleanai (https://github.com/willmil11/cleanai) except I'm making it in c now. Fast and clean from the start this time :)☆17Mar 6, 2026Updated last month
- Tools for formatting large language model prompts.☆13Dec 19, 2023Updated 2 years ago
- The official implementation of the ICML 2023 paper OFQ-ViT☆39Oct 3, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Implementation of Gradient Agreement Filtering, from Chaubard et al. of Stanford, but for single machine microbatches, in Pytorch☆25Jan 21, 2025Updated last year
- ☆14May 21, 2024Updated last year
- A quantization algorithm for LLM☆149Jun 21, 2024Updated last year
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆123Jul 4, 2025Updated 9 months ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆34Updated this week
- The High Performance LLM Native Mock Server☆25Updated this week
- ☆10Jan 23, 2025Updated last year
- Your Interface to Intelligence☆47Mar 26, 2026Updated 2 weeks ago
- [ICLR25] STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs☆20Jun 3, 2025Updated 10 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- An Alfred workflow to toggle Yosemite's dark and light modes.☆14Oct 6, 2018Updated 7 years ago
- [ACL 2025] RealHiTBench: A Comprehensive Realistic Hierarchical Table Benchmark for Evaluating LLM-Based Table Analysis☆25Aug 8, 2025Updated 8 months ago
- Official implementation of Adaptive Feature Transfer (AFT)☆24Jun 12, 2024Updated last year
- LLM model quantization (compression) toolkit with HW acceleration support for Nvidia, AMD, Intel GPU and Intel/AMD/Apple CPU via HF, vLLM…☆1,101Updated this week
- [ICLR2025]: OSTQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitt…☆90Apr 8, 2025Updated last year
- Recurrence of some small algorithms☆11Feb 23, 2021Updated 5 years ago
- This repository contains the training code of ParetoQ introduced in our work "ParetoQ Scaling Laws in Extremely Low-bit LLM Quantization"☆123Oct 15, 2025Updated 6 months ago