NVIDIA / TensorRT-RTXLinks
NVIDIA TensorRT-RTX is an SDK for high-performance AI inference on NVIDIA RTX GPUs. This repository contains Open-Source Software components of TensorRT-RTX.
☆44Updated 2 weeks ago
Alternatives and similar repositories for TensorRT-RTX
Users that are interested in TensorRT-RTX are comparing it to the libraries listed below
Sorting:
- C++ pipeline with OpenVINO native API for Stable Diffusion v1.5☆13Updated last year
- ☆84Updated 2 years ago
- A faster implementation of OpenCV-CUDA that uses OpenCV objects, and more!☆52Updated last month
- Inference deployment of the llama3☆11Updated last year
- A Toolkit to Help Optimize Onnx Model☆198Updated last week
- PyTorch half precision gemm lib w/ fused optional bias + optional relu/gelu☆72Updated 8 months ago
- HunyuanDiT with TensorRT and libtorch☆17Updated last year
- (WIP) Parallel inference for black-forest-labs' FLUX model.☆19Updated 9 months ago
- Model compression for ONNX☆97Updated 9 months ago
- SAM and lama inpaint,包含QT的GUI交互界面,实现了交互式可实时显示结果的画点、画框进行SAM,然后通过进行Inpaint,具体操作看readme里的视频。☆50Updated last year
- https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching☆366Updated last month
- Memory Management for the GPU Poor, run the latest open source frontier models on consumer Nvidia GPUs☆148Updated 3 weeks ago
- A CUDA kernel for NHWC GroupNorm for PyTorch☆20Updated 9 months ago
- A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface☆116Updated 2 weeks ago
- Spandrel gives your project support for various PyTorch architectures meant for AI Super-Resolution, restoration, and inpainting. Based o…☆256Updated 3 months ago
- faster parallel inference of mochi-1 video generation model☆126Updated 6 months ago
- Fast and memory-efficient exact attention☆17Updated 8 months ago
- Deep learning training framework for image super resolution and restoration.☆77Updated 2 weeks ago
- Use safetensors with ONNX 🤗☆69Updated last month
- TensorRT is a C++ library for high performance inference on NVIDIA GPUs and deep learning accelerators.☆19Updated last year
- Official Implementation of “One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution”☆260Updated 3 weeks ago
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆61Updated 3 months ago
- Extended Musubi Tuner with latent previews, fp16 accumulation, advanced cfg scheduling and more☆27Updated this week
- ☆284Updated 8 months ago
- Faster generation with text-to-image diffusion models.☆225Updated 2 months ago
- [WIP] Better (FP8) attention for Hopper☆32Updated 6 months ago
- ☆50Updated last month
- NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that del…☆26Updated 2 years ago
- a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…☆47Updated last year
- A Toolkit to Help Optimize Large Onnx Model☆158Updated last year