cenconq25 / delta-compress-llmView on GitHub
Proof of concept: Exploiting temporal coherence in LLM inference-- delta encoding for KV cache compression and weight-skip prediction. Achieves F16-quality KV cache at Q4_0 compression ratios with zero perplexity loss on llama.cpp.
39Mar 24, 2026Updated this week

Alternatives and similar repositories for delta-compress-llm

Users that are interested in delta-compress-llm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?