NolanoOrg / sparse_quant_llmsLinks

SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia

☆41

Alternatives and similar repositories for sparse_quant_llms

Users that are interested in sparse_quant_llms are comparing it to the libraries listed below

Sorting:

qwopqwop200 / gptqlora
GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ
☆102Updated 2 years ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆61Updated last year
GreenBitAI / low_bit_llama
Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs
☆110Updated last year
kaiokendev / cutoff-len-is-context-len
Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit
☆63Updated 2 years ago
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆27Updated 2 years ago
nanowell / Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆33Updated last year
wdlctc / mini-s
☆53Updated last year
Digitous / LLM-SLERP-Merge
Spherical Merge Pytorch/HF format Language Models with minimal feature loss.
☆141Updated 2 years ago
horseee / LLaMA-Pruning
Structural Pruning for LLaMA
☆54Updated 2 years ago
hahnyuan / PB-LLM
PB-LLM: Partially Binarized Large Language Models
☆157Updated 2 years ago
IST-DASLab / SparseFinetuning
Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry
☆42Updated last year
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆127Updated 2 years ago
euclaise / supertrainer2000
☆50Updated last year
kyegomez / LM-Infinite
Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"
☆40Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆100Updated last year
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆116Updated 2 years ago
AlpinDale / RPTQ-for-LLaMA
Efficient 3bit/4bit quantization of LLaMA models
☆19Updated 2 years ago
jondurbin / qlora
QLoRA: Efficient Finetuning of Quantized LLMs
☆79Updated last year
AlpinDale / sparsegpt-for-LLaMA
Code for the paper "SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot" with LLaMA implementation.
☆71Updated 2 years ago
Zyphra / Zyda_processing
☆39Updated last year
HanGuo97 / lq-lora
☆128Updated last year
AlpinDale / QuIP-for-Llama
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models
☆41Updated 2 years ago
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆147Updated last year
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year
kernelmachine / cbtm
Code repository for the c-BTM paper
☆108Updated 2 years ago
hahnyuan / RPTQ4LLM
Reorder-based post-training quantization for large language model
☆197Updated 2 years ago
uukuguy / multi_loras
Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…
☆157Updated last year
huu4ontocord / MDEL
Multi-Domain Expert Learning
☆67Updated last year
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
OpenMOSE / RWKV-Infer
A large-scale RWKV v7(World, PRWKV, Hybrid-RWKV) inference. Capable of inference by combining multiple states(Pseudo MoE). Easy to deploy…
☆45Updated last month