erika-n / GPTzip
An implementation of LLMzip using GPT-2
☆12Updated last year
Alternatives and similar repositories for GPTzip:
Users that are interested in GPTzip are comparing it to the libraries listed below
- ☆47Updated 2 months ago
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆56Updated 5 months ago
- QuIP quantization☆52Updated last year
- ☆113Updated last week
- PB-LLM: Partially Binarized Large Language Models☆152Updated last year
- Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.☆127Updated this week
- ☆79Updated 4 months ago
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆99Updated last year
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆41Updated 2 years ago
- ☆46Updated 8 months ago
- Modeling code for a BitNet b1.58 Llama-style model.☆23Updated 11 months ago
- Model REVOLVER, a human in the loop model mixing system.☆33Updated last year
- ☆195Updated 3 months ago
- Work in progress.☆50Updated 2 weeks ago
- ☆220Updated 9 months ago
- RWKV, in easy to read code☆71Updated last week
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated 5 months ago
- FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation☆47Updated 8 months ago
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆261Updated 5 months ago
- ☆40Updated 2 years ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆36Updated last year
- Token Omission Via Attention☆124Updated 5 months ago
- Plug in & Play Pytorch Implementation of the paper: "Evolutionary Optimization of Model Merging Recipes" by Sakana AI☆30Updated 4 months ago
- Official code for the paper "Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark"☆13Updated 9 months ago
- RWKV-7: Surpassing GPT☆82Updated 4 months ago
- The training notebooks that were similar to the original script used to train TinyMistral.☆21Updated last year
- An algorithm for weight-activation quantization (W4A4, W4A8) of LLMs, supporting both static and dynamic quantization☆124Updated last month
- 1.58-bit LLaMa model☆82Updated 11 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆195Updated 8 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆148Updated 2 months ago