Vahe1994 / AQLMLinks

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression https://arxiv.org/abs/2405.14852

☆1,301

Alternatives and similar repositories for AQLM

Users that are interested in AQLM are comparing it to the libraries listed below

Sorting:

huggingface / optimum-nvidia
☆1,007Updated 8 months ago
AnswerDotAI / fsdp_qlora
Training LLMs with QLoRA + FSDP
☆1,527Updated 11 months ago
MDK8888 / GPTFast
Accelerate your Hugging Face Transformers 7.6-9x. Native to Hugging Face and PyTorch.
☆685Updated last year
myshell-ai / JetMoE
Reaching LLaMA2 Performance with 0.1M Dollars
☆986Updated last year
dropbox / hqq
Official implementation of Half-Quadratic Quantization (HQQ)
☆886Updated last week
jiaweizzhao / GaLore
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
☆1,616Updated last year
Cornell-RelaxML / quip-sharp
☆561Updated last year
mustafaaljadery / gemma-2B-10M
Gemma 2B with 10M context length using Infini-attention.
☆940Updated last year
AI-Hypercomputer / maxtext
A simple, performant and scalable Jax LLM!
☆1,942Updated this week
kyegomez / BitNet
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
☆1,890Updated last week
dvmazur / mixtral-offloading
Run Mixtral-8x7B models in Colab or consumer desktops
☆2,327Updated last year
hao-ai-lab / LookaheadDecoding
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,291Updated 7 months ago
uclaml / SPIN
The official implementation of Self-Play Fine-Tuning (SPIN)
☆1,209Updated last year
Vahe1994 / SpQR
☆547Updated 10 months ago
mistralai-sf24 / hackathon
☆446Updated last year
OpenGVLab / OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆865Updated 5 months ago
aphrodite-engine / aphrodite-engine
Large-scale LLM inference engine
☆1,577Updated 3 weeks ago
tomaarsen / attention_sinks
Extend existing LLMs way beyond the original training length with constant memory usage, without retraining
☆724Updated last year
SqueezeAILab / SqueezeLLM
[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization
☆704Updated last year
microsoft / VPTQ
VPTQ, A Flexible and Extreme low-bit quantization algorithm
☆661Updated 6 months ago
casper-hansen / AutoAWQ
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆2,261Updated 5 months ago
huggingface / optimum-quanto
A pytorch quantization backend for optimum
☆1,004Updated last week
microsoft / TransformerCompression
For releasing code related to compression methods for transformers, accompanying our publications
☆447Updated 9 months ago
intel / intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…
☆2,165Updated last year
lucidrains / self-rewarding-lm-pytorch
Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI
☆1,398Updated last year
Cornell-RelaxML / QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆385Updated last year
nomic-ai / contrastors
Train Models Contrastively in Pytorch
☆754Updated 7 months ago
punica-ai / punica
Serving multiple LoRA finetuned LLM as one
☆1,110Updated last year
apoorvumang / prompt-lookup-decoding
☆573Updated last year
S-LoRA / S-LoRA
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
☆1,864Updated last year