decapoda-research / GPTQ-ToolsLinks
4-bit quantization of models using GPTQ
☆18Updated 2 years ago
Alternatives and similar repositories for GPTQ-Tools
Users that are interested in GPTQ-Tools are comparing it to the libraries listed below
Sorting:
- Exploring finetuning public checkpoints on filter 8K sequences on Pile☆114Updated 2 years ago
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆103Updated 2 years ago
- QLoRA: Efficient Finetuning of Quantized LLMs☆78Updated last year
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆124Updated last year
- Multipack distributed sampler for fast padding-free training of LLMs☆190Updated 9 months ago
- Experiments on speculative sampling with Llama models☆126Updated last year
- Demonstration that finetuning RoPE model on larger sequences than the pre-trained model adapts the model context limit☆63Updated last year
- PB-LLM: Partially Binarized Large Language Models☆152Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆275Updated last year
- Simple implementation of Speculative Sampling in NumPy for GPT-2.☆95Updated last year
- ☆125Updated last year
- Code and models for BERT on STILTs☆53Updated 2 years ago
- Inference script for Meta's LLaMA models using Hugging Face wrapper☆110Updated 2 years ago
- Tune MPTs☆84Updated last year
- Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs☆110Updated last year
- ☆46Updated last week
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆40Updated 2 years ago
- QuIP quantization☆52Updated last year
- Positional Skip-wise Training for Efficient Context Window Extension of LLMs to Extremely Length (ICLR 2024)☆203Updated last year
- Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first app…☆167Updated last year
- ☆95Updated last year
- Convenient wrapper for fine-tuning and inference of Large Language Models (LLMs) with several quantization techniques (GTPQ, bitsandbytes…☆147Updated last year
- Evaluating LLMs with CommonGen-Lite☆90Updated last year
- ☆197Updated 6 months ago
- experiments with inference on llama☆104Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆97Updated 8 months ago
- Official code for ReLoRA from the paper Stack More Layers Differently: High-Rank Training Through Low-Rank Updates☆456Updated last year
- Data preparation code for Amber 7B LLM☆91Updated last year
- ☆92Updated last year
- ☆53Updated last year