BenChaliah / NVFP4-on-4090-vLLMView on GitHub
AdaLLM is an NVFP4-first inference runtime for Ada Lovelace (RTX 4090) with FP8 KV cache and custom decode kernels. This repo targets NVFP4 weights and keeps the entire decode path in FP8
97Feb 15, 2026Updated 3 weeks ago

Alternatives and similar repositories for NVFP4-on-4090-vLLM

Users that are interested in NVFP4-on-4090-vLLM are comparing it to the libraries listed below

Sorting:

Are these results useful?