microsoft / LLMLinguaView on GitHub
[EMNLP'23, ACL'24] To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
6,003Apr 8, 2026Updated last week

Alternatives and similar repositories for LLMLingua

Users that are interested in LLMLingua are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?