☆129Jan 22, 2024Updated 2 years ago
Alternatives and similar repositories for lq-lora
Users that are interested in lq-lora are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Official PyTorch implementation of QA-LoRA☆146Mar 13, 2024Updated 2 years ago
- ☆235Jun 11, 2024Updated last year
- PB-LLM: Partially Binarized Large Language Models☆155Nov 20, 2023Updated 2 years ago
- Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…☆69Mar 7, 2024Updated 2 years ago
- Official Implementation of "Learning to Refuse: Towards Mitigating Privacy Risks in LLMs"☆10Dec 13, 2024Updated last year
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆26Nov 23, 2023Updated 2 years ago
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆281Nov 3, 2023Updated 2 years ago
- Activation-aware Singular Value Decomposition for Compressing Large Language Models☆91Oct 22, 2024Updated last year
- Fine-Tuning Pre-trained Transformers into Decaying Fast Weights☆19Oct 9, 2022Updated 3 years ago
- ☆203Dec 5, 2024Updated last year
- ☆13Aug 23, 2024Updated last year
- [MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving☆336Jul 2, 2024Updated last year
- ☆16Dec 9, 2023Updated 2 years ago
- QAQ: Quality Adaptive Quantization for LLM KV Cache☆53Mar 27, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [NeurIPS 2024 D&B] Evaluating Copyright Takedown Methods for Language Models☆17Jul 17, 2024Updated last year
- Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging. Arxiv, 2024.☆16Oct 28, 2024Updated last year
- Butler 是一个用于自动化服务管理和任务调度的工具项目。☆16Apr 6, 2026Updated last week
- ☆588Oct 29, 2024Updated last year
- An innovative method expediting LLMs via streamlined semi-autoregressive generation and draft verification.☆28Apr 15, 2025Updated last year
- [ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization☆717Aug 13, 2024Updated last year
- Code implementation of GPTAQ (https://arxiv.org/abs/2504.02692)☆88Jul 28, 2025Updated 8 months ago
- ☆25Oct 31, 2024Updated last year
- ☆15Nov 7, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆19Jun 21, 2025Updated 9 months ago
- ☆30Jul 22, 2024Updated last year
- A simple and effective LLM pruning approach.☆860Aug 9, 2024Updated last year
- A method for evaluating the high-level coherence of machine-generated texts. Identifies high-level coherence issues in transformer-based …☆11Mar 18, 2023Updated 3 years ago
- AFPQ code implementation☆23Nov 6, 2023Updated 2 years ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)☆145Sep 20, 2024Updated last year
- Triton Implementation of HyperAttention Algorithm☆48Dec 11, 2023Updated 2 years ago
- [ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.☆892Nov 26, 2025Updated 4 months ago
- Official repository of the "Transformer Fusion with Optimal Transport" paper, published as a conference paper at ICLR 2024.☆31Apr 19, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- [ICLRW'26] EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆33Mar 24, 2026Updated 3 weeks ago
- Repository for the QUIK project, enabling the use of 4bit kernels for generative inference - EMNLP 2024☆185Apr 16, 2024Updated 2 years ago
- Official repo for NeurIPS'24 paper "WAGLE: Strategic Weight Attribution for Effective and Modular Unlearning in Large Language Models"☆19Dec 16, 2024Updated last year
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆65Apr 15, 2024Updated 2 years ago
- [ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models☆23Mar 15, 2024Updated 2 years ago
- Generated geosite.dat based on Antifilter Community List☆25Updated this week
- BESA is a differentiable weight pruning technique for large language models.☆17Mar 4, 2024Updated 2 years ago