machilusZ / FastGen

This repo contains the source code for: Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
28Updated last month

Related projects: