Official PyTorch implementation of "GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance" (ICML 2025)
☆50Jul 6, 2025Updated 8 months ago
Alternatives and similar repositories for GuidedQuant
Users that are interested in GuidedQuant are comparing it to the libraries listed below
Sorting:
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)☆81Jul 28, 2025Updated 7 months ago
- Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"☆29Jun 30, 2025Updated 8 months ago
- Tools for formatting large language model prompts.☆13Dec 19, 2023Updated 2 years ago
- Demo for Qwen2.5-VL-3B-Instruct on Axera device.☆17Sep 3, 2025Updated 6 months ago
- Minimal PyTorch implementation of TP, SP, FSDP and sharded-EMA☆31Nov 27, 2025Updated 3 months ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆38Sep 24, 2024Updated last year
- ☆22May 5, 2025Updated 10 months ago
- A forward proxy to turn network traffic into personal memory for AI agents☆36Feb 23, 2026Updated last week
- ☆85Jan 23, 2025Updated last year
- ☆21Feb 5, 2024Updated 2 years ago
- Official implementation of Adaptive Feature Transfer (AFT)☆23Jun 12, 2024Updated last year
- FireQ: Fast INT4-FP8 Kernel and RoPE-aware Quantization for LLM Inference Acceleration☆20Jun 27, 2025Updated 8 months ago
- ☆25Oct 31, 2024Updated last year
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization☆172Nov 26, 2025Updated 3 months ago
- An AI Vision Language Model System for extracting structured knowledge graph information(JSON) from images of process diagrams☆40Apr 5, 2025Updated 11 months ago
- Code repo for the paper "SpinQuant LLM quantization with learned rotations"☆373Feb 14, 2025Updated last year
- Fast and memory efficient PyTorch implementation of the Perceiver with FlashAttention.☆31Nov 4, 2024Updated last year
- ☆53Oct 10, 2025Updated 4 months ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆35Feb 11, 2026Updated 3 weeks ago
- (ICCV 2023) Official implementation of Rectified Straight Through Estimator (ReSTE).☆31Sep 20, 2024Updated last year
- [ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs☆122Jul 4, 2025Updated 8 months ago
- The High Performance LLM Native Mock Server☆19Jan 8, 2026Updated last month
- 🚀 FlexLLama - Lightweight self-hosted tool for running multiple llama.cpp server instances with OpenAI v1 API compatibility and multi-GP…☆50Feb 17, 2026Updated 2 weeks ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"☆397Feb 24, 2024Updated 2 years ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Mar 7, 2025Updated 11 months ago
- Llama.cpp runner/swapper and proxy that emulates LMStudio / Ollama backends☆52Aug 21, 2025Updated 6 months ago
- Text to audio with Tik-Tok Voices☆13Apr 6, 2023Updated 2 years ago
- Bilinear Pairings Components Library for Delphi☆12Dec 19, 2018Updated 7 years ago
- A quantization algorithm for LLM☆148Jun 21, 2024Updated last year
- QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.☆154Aug 21, 2025Updated 6 months ago
- GGUF Quantization of any LLM.☆41Mar 4, 2024Updated 2 years ago
- ☆10Sep 29, 2024Updated last year
- A Kong plugin that allows access to an upstream url through a forward proxy (eg. squid).☆11Apr 30, 2018Updated 7 years ago
- Prototype of fraud proofs.☆12Feb 13, 2022Updated 4 years ago
- Kafka Manager Dockerfile☆11Nov 22, 2017Updated 8 years ago
- ☆16Updated this week
- Copilot with deepseek and more...☆13Mar 7, 2025Updated 11 months ago
- Chat WebUI is an easy-to-use user interface for interacting with AI, and it comes with multiple useful built-in tools such as web search …☆51Feb 10, 2026Updated 3 weeks ago
- An open-source command line interface for linting your Ethereum 2.0 validator set up☆14May 17, 2021Updated 4 years ago