NVIDIA / tao_deployLinks
Package for deploying deep learning models from TAO Toolkit
☆19Updated 9 months ago
Alternatives and similar repositories for tao_deploy
Users that are interested in tao_deploy are comparing it to the libraries listed below
Sorting:
- TAO Toolkit deep learning networks with PyTorch backend☆95Updated 6 months ago
- TAO Toolkit deep learning networks with TensorFlow 1.x backend☆13Updated last year
- Quick start scripts and tutorial notebooks to get started with TAO Toolkit☆85Updated 9 months ago
- ☆96Updated 8 months ago
- CLIP and SigLIP models optimized with TensorRT with a Transformers-like API☆25Updated 8 months ago
- NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.☆199Updated 11 months ago
- ☆67Updated 7 months ago
- ☆32Updated last year
- ☆31Updated 11 months ago
- Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function ind…☆96Updated last year
- Sample app code for deploying TAO Toolkit trained models to Triton☆87Updated 9 months ago
- Awesome code, projects, books, etc. related to CUDA☆17Updated last month
- llama INT4 cuda inference with AWQ☆54Updated 4 months ago
- Datasets, Transforms and Models specific to Computer Vision☆85Updated last year
- Deep insight tensorrt, including but not limited to qat, ptq, plugin, triton_inference, cuda☆18Updated 3 weeks ago
- A collection of reference AI microservices and workflows for Jetson Platform Services☆39Updated 4 months ago
- A reference application for a local AI assistant with LLM and RAG☆112Updated 6 months ago
- A tool convert TensorRT engine/plan to a fake onnx☆39Updated 2 years ago
- This repository describes how to add a custom TensorRT plugin in c++ and python☆27Updated 3 years ago
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆56Updated 3 weeks ago
- ☆11Updated last month
- ☆66Updated 2 years ago
- Training LLaMA language model with MMEngine! It supports LoRA fine-tuning!☆40Updated 2 years ago
- A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space☆85Updated 4 months ago
- A tutorial introducing knowledge distillation as an optimization technique for deployment on NVIDIA Jetson☆196Updated last year
- 📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.☆184Updated 3 weeks ago
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆68Updated this week
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆203Updated 2 weeks ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆36Updated 2 months ago
- ☆49Updated 2 weeks ago