decapoda-research / GPTQ-ToolsLinks

4-bit quantization of models using GPTQ

☆18

Alternatives and similar repositories for GPTQ-Tools

Users that are interested in GPTQ-Tools are comparing it to the libraries listed below

Sorting:

kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆114Updated 2 years ago
qwopqwop200 / gptqlora
GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ
☆103Updated 2 years ago
jondurbin / qlora
QLoRA: Efficient Finetuning of Quantized LLMs
☆78Updated last year
Digitous / LLM-SLERP-Merge
Spherical Merge Pytorch/HF format Language Models with minimal feature loss.
☆124Updated last year
imoneoi / multipack_sampler
Multipack distributed sampler for fast padding-free training of LLMs
☆190Updated 9 months ago
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆126Updated last year
kaiokendev / cutoff-len-is-context-len
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆63Updated last year
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆152Updated last year
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆275Updated last year
jaymody / speculative-sampling
Simple implementation of Speculative Sampling in NumPy for GPT-2.
☆95Updated last year
HanGuo97 / lq-lora
☆125Updated last year
zphang / transformers
Code and models for BERT on STILTs
☆53Updated 2 years ago
zsc / llama_infer
Inference script for Meta's LLaMA models using Hugging Face wrapper
☆110Updated 2 years ago
rmihaylov / mpttune
Tune MPTs
☆84Updated last year
GreenBitAI / low_bit_llama
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
☆110Updated last year
IST-DASLab / Quartet
☆46Updated last week
NolanoOrg / sparse_quant_llms
SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia
☆40Updated 2 years ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆52Updated last year
dwzhu-pku / PoSE
Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)
☆203Updated last year
argilla-io / notus
Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…
☆167Updated last year
hydrallm / llama-moe-v1
☆95Updated last year
taprosoft / llm_finetuning
Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…
☆147Updated last year
allenai / CommonGen-Eval
Evaluating LLMs with CommonGen-Lite
☆90Updated last year
FasterDecoding / BitDelta
☆197Updated 6 months ago
hamelsmu / llama-inference
experiments with inference on llama
☆104Updated last year
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆97Updated 8 months ago
Guitaricet / relora
Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates
☆456Updated last year
LLM360 / amber-data-prep
Data preparation code for Amber 7B LLM
☆91Updated last year
abacaj / train-with-fsdp
☆92Updated last year
nyunAI / PruneGPT
☆53Updated last year