James-QiuHaoran / LLM-serving-with-proxy-models

Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | A tiny model can tell you the verbosity of an LLM (with low latency!)
22Updated 5 months ago

Related projects

Alternatives and complementary repositories for LLM-serving-with-proxy-models