qwopqwop200 / gptqloraLinks

GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ

☆102

Alternatives and similar repositories for gptqlora

Users that are interested in gptqlora are comparing it to the libraries listed below

Sorting:

Digitous / LLM-SLERP-Merge
Spherical Merge Pytorch/HF format Language Models with minimal feature loss.
☆141Updated 2 years ago
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆202Updated last year
uukuguy / multi_loras
Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…
☆157Updated last year
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆157Updated 2 years ago
HanGuo97 / lq-lora
☆128Updated last year
FasterDecoding / BitDelta
☆204Updated last year
chu-tianxiang / QuIP-for-all
QuIP quantization
☆61Updated last year
GreenBitAI / low_bit_llama
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
☆110Updated last year
jondurbin / qlora
QLoRA: Efficient Finetuning of Quantized LLMs
☆79Updated last year
yuhuixu1993 / qa-lora
Official PyTorch implementation of QA-LoRA
☆145Updated last year
arcee-ai / PruneMe
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆256Updated last year
NolanoOrg / sparse_quant_llms
SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia
☆41Updated 2 years ago
imoneoi / multipack
Multipack distributed sampler for fast padding-free training of LLMs
☆202Updated last year
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆278Updated 2 years ago
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆147Updated last year
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆155Updated last year
catid / dora
Implementation of DoRA
☆307Updated last year
Gryphe / BlockMerge_Gradient
Merge Transformers language models by use of gradient parameters.
☆209Updated last year
Cornell-RelaxML / QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆390Updated last year
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆127Updated 2 years ago
Guitaricet / relora
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
☆469Updated last year
nikhilgsh / loraplus
☆229Updated last year
yxli2123 / LoftQ
☆235Updated last year
ChrisHayduk / qlora-multi-gpu
QLoRA with Enhanced Multi GPU Support
☆37Updated 2 years ago
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆205Updated last year
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆102Updated last year
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆116Updated 2 years ago
kernelmachine / cbtm
Code repository for the c-BTM paper
☆108Updated 2 years ago
jaymody / speculative-sampling
Simple implementation of Speculative Sampling in NumPy for GPT-2.
☆98Updated 2 years ago
thomasgauthier / LoRD
Low-Rank adapter extraction for fine-tuned transformers models
☆179Updated last year