dvlab-research / Q-LLM

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
38Updated 3 months ago

Related projects

Alternatives and complementary repositories for Q-LLM