erika-n / GPTzipLinks

An implementation of LLMzip using GPT-2

☆13

Alternatives and similar repositories for GPTzip

Users that are interested in GPTzip are comparing it to the libraries listed below

Sorting:

Cornell-RelaxML / qtip
☆139Updated 3 weeks ago
vcskaushik / LLMzip
☆56Updated 6 months ago
BorealisAI / neuzip
Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…
☆59Updated 8 months ago
chu-tianxiang / QuIP-for-all
QuIP quantization
☆54Updated last year
OpenGVLab / EfficientQAT
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆276Updated last month
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆243Updated 5 months ago
cognitivecomputations / grokadamw
☆134Updated 10 months ago
nikhil-ghosh-berkeley / loraplus
☆219Updated last year
arcee-ai / PruneMe
Automated Identification of Redundant Layer Blocks for Pruning in Large Language Models
☆240Updated last year
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆198Updated 11 months ago
qwopqwop200 / gptqlora
GPTQLoRA: Efficient Finetuning of Quantized LLMs with GPTQ
☆102Updated 2 years ago
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆92Updated 7 months ago
snu-mllab / KVzip
Query-agnostic KV cache eviction: 3–4× reduction in memory and 2× decrease in latency (Qwen3/2.5, Gemma3, LLaMA3)
☆91Updated this week
sebulo / LoQT
☆79Updated 8 months ago
DeepAuto-AI / hip-attention
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
☆139Updated this week
uukuguy / multi_loras
Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…
☆156Updated last year
CoffeeVampir3 / ez-trainer
Train Llama Loras Easily
☆31Updated last year
Gryphe / MergeMonster
An unsupervised model merging algorithm for Transformers-based language models.
☆105Updated last year
pranavjad / tinyllama-bitnet
Train your own small bitnet model
☆74Updated 8 months ago
OpenMachine-ai / transformer-tricks
A collection of tricks and tools to speed up transformer models
☆170Updated last month
jukofyork / transplant-vocab
Transplants vocabulary between language models, enabling the creation of draft models for speculative decoding WITHOUT retraining.
☆31Updated 3 months ago
gabrielolympie / moe-pruner
A repository aimed at pruning DeepSeek V3, R1 and R1-zero to a usable size
☆61Updated 3 months ago
Infini-AI-Lab / Sequoia
scalable and robust tree-based speculative decoding algorithm
☆349Updated 5 months ago
catid / dora
Implementation of DoRA
☆296Updated last year
FasterDecoding / BitDelta
☆199Updated 7 months ago
Digitous / ModelREVOLVER
Model REVOLVER, a human in the loop model mixing system.
☆33Updated last year
jukofyork / control-vectors
Genertaes control vectors for use with llama.cpp in GGUF format.
☆26Updated 3 months ago
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 8 months ago
dingo-actual / infini-transformer
PyTorch implementation of Infini-Transformer from "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention…
☆290Updated last year
microsoft / LongRoPE
LongRoPE is a novel method that can extends the context window of pre-trained LLMs to an impressive 2048k tokens.
☆232Updated 10 months ago