☆129Jan 22, 2024Updated 2 years ago
Alternatives and similar repositories for lq-lora
Users that are interested in lq-lora are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official PyTorch implementation of QA-LoRA☆147Mar 13, 2024Updated 2 years ago
- ☆235Jun 11, 2024Updated last year
- PB-LLM: Partially Binarized Large Language Models☆156Nov 20, 2023Updated 2 years ago
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…☆70Mar 7, 2024Updated 2 years ago
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆26Nov 23, 2023Updated 2 years ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆280Nov 3, 2023Updated 2 years ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆92Oct 22, 2024Updated last year
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- ☆206Dec 5, 2024Updated last year
- ☆13Aug 23, 2024Updated last year
- [ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆45Apr 21, 2026Updated last month
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆340Jul 2, 2024Updated last year
- ☆16Dec 9, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- QAQ: Quality Adaptive Quantization for LLM KV Cache☆54Mar 27, 2024Updated 2 years ago
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.☆16Oct 28, 2024Updated last year
- Butler 是一个用于自动化服务管理和任务调度的工具项目。☆16May 16, 2026Updated last week
- ☆591Oct 29, 2024Updated last year
- An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.☆28Apr 15, 2025Updated last year
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆720Aug 13, 2024Updated last year
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)☆90Jul 28, 2025Updated 9 months ago
- ☆25Oct 31, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆15Nov 7, 2024Updated last year
- ☆19Jun 21, 2025Updated 11 months ago
- ☆30Jul 22, 2024Updated last year
- A simple and effective LLM pruning approach.☆864Aug 9, 2024Updated last year
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆12Mar 18, 2023Updated 3 years ago
- AFPQ code implementation☆23Nov 6, 2023Updated 2 years ago
- Triton Implementation of HyperAttention Algorithm☆48Dec 11, 2023Updated 2 years ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)☆146Sep 20, 2024Updated last year
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.☆898Nov 26, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official repository of the "Transformer Fusion with Optimal Transport" paper, published as a conference paper at ICLR 2024.☆31Apr 19, 2024Updated 2 years ago
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆186Apr 16, 2024Updated 2 years ago
- Official repo for NeurIPS'24 paper "WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models"☆19Dec 16, 2024Updated last year
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆65Apr 15, 2024Updated 2 years ago
- Generated geosite.dat based on Antifilter Community List☆28Updated this week
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆23Mar 15, 2024Updated 2 years ago
- BESA is a differentiable weight pruning technique for large language models.☆17Mar 4, 2024Updated 2 years ago