usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
208Updated 3 weeks ago

Related projects

Alternatives and complementary repositories for fp6_llm