erika-n / GPTzip
An implementation of LLMzip using GPT-2
☆12Updated last year
Related projects ⓘ
Alternatives and complementary repositories for GPTzip
- ☆43Updated last year
- QuIP quantization☆46Updated 8 months ago
- ☆67Updated last week
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆226Updated last month
- The training notebooks that were similar to the original script used to train TinyMistral.☆19Updated 11 months ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆36Updated last year
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆129Updated 2 months ago
- Model REVOLVER, a human in the loop model mixing system.☆33Updated last year
- This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.☆80Updated 4 months ago
- An algorithm for static activation quantization of LLMs☆77Updated 2 weeks ago
- ☆96Updated last month
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆92Updated last month
- PB-LLM: Partially Binarized Large Language Models☆148Updated last year
- The homepage of OneBit model quantization framework.☆157Updated 4 months ago
- KV cache compression for high-throughput LLM inference☆87Updated this week
- ☆63Updated last month
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆154Updated last month
- SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia☆41Updated last year
- ☆43Updated 4 months ago
- ☆53Updated 5 months ago
- Yet another frontend for LLM, written using .NET and WinUI 3☆11Updated this week
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.☆173Updated 4 months ago
- ☆40Updated last year
- 1.58-bit LLaMa model☆79Updated 7 months ago
- Unofficial Implementation of Evolutionary Model Merging☆33Updated 7 months ago
- Modeling code for a BitNet b1.58 Llama-style model.☆23Updated 6 months ago
- ☆184Updated last month
- GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ☆97Updated last year
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆32Updated 3 weeks ago
- [NeurIPS 24 Spotlight] MaskLLM: Learnable Semi-structured Sparsity for Large Language Models☆118Updated 3 weeks ago