turbo-tan / llama.cpp-tq3View on GitHub
llama.cpp fork with TQ3_1S/4S CUDA kernels — 3.5-bit WHT quantization achieving Q4s quality at 10% smaller size. Based on RaBitQ-inspired Walsh-Hadamard transform. Enables 27B models on 16GB GPUs with 15 tok/s TG, 221 tok/s PP.
78Apr 13, 2026Updated this week

Alternatives and similar repositories for llama.cpp-tq3

Users that are interested in llama.cpp-tq3 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?