BenChaliah / NVFP4-on-4090-vLLM
View external linksLinks

AdaLLM is an NVFP4-first inference runtime for Ada Lovelace (RTX 4090) with FP8 KV cache and custom decode kernels. This repo targets NVFP4 weights and keeps the entire decode path in FP8
86Updated this week

Alternatives and similar repositories for NVFP4-on-4090-vLLM

Users that are interested in NVFP4-on-4090-vLLM are comparing it to the libraries listed below

Sorting:

Are these results useful?