dvlab-research / Q-LLM

This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
39Updated 4 months ago

Related projects

Alternatives and complementary repositories for Q-LLM