BenChaliah / NVFP4-on-4090-vLLMView on GitHub
AdaLLM is an NVFP4-first inference runtime for Ada Lovelace (RTX 4090) with FP8 KV cache and custom decode kernels. This repo targets NVFP4 weights and keeps the entire decode path in FP8
101Feb 15, 2026Updated last month

Alternatives and similar repositories for NVFP4-on-4090-vLLM

Users that are interested in NVFP4-on-4090-vLLM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Are these results useful?