NVIDIA / tao_deploy
Package for deploying deep learning models from TAO Toolkit
☆19Updated 7 months ago
Alternatives and similar repositories for tao_deploy:
Users that are interested in tao_deploy are comparing it to the libraries listed below
- TAO Toolkit deep learning networks with PyTorch backend☆91Updated 5 months ago
- TAO Toolkit deep learning networks with TensorFlow 1.x backend☆13Updated last year
- Quick start scripts and tutorial notebooks to get started with TAO Toolkit☆78Updated 7 months ago
- NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.☆192Updated 10 months ago
- ☆94Updated 7 months ago
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆54Updated 10 months ago
- ☆32Updated last year
- Generalist YOLO: Towards Real-Time End-to-End Multi-Task Visual Language Models☆67Updated last month
- ☆66Updated 2 years ago
- CUda Matrix Multiply library.☆75Updated last month
- A tutorial introducing knowledge distillation as an optimization technique for deployment on NVIDIA Jetson☆190Updated last year
- DeepStream Libraries offer CVCUDA, NvImageCodec, and PyNvVideoCodec modules as Python APIs for seamless integration into custom framewor…☆49Updated 6 months ago
- ☆62Updated 5 months ago
- Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function ind…☆90Updated last year
- 📚FFPA(Split-D): Yet another Faster Flash Attention with O(1) GPU SRAM complexity large headdim, 1.8x~3x↑🎉 faster than SDPA EA.☆164Updated last week
- Awesome code, projects, books, etc. related to CUDA☆16Updated last week
- CLIP and SigLIP models optimized with TensorRT with a Transformers-like API☆22Updated 6 months ago
- Training LLaMA language model with MMEngine! It supports LoRA fine-tuning!☆40Updated 2 years ago
- A tool convert TensorRT engine/plan to a fake onnx☆38Updated 2 years ago
- TAO best practices. How to adapt for a new domain, new classes, and generalize the model with a small dataset using Nvidia's TAO toolkit☆24Updated 2 years ago
- DeltaCNN End-to-End CNN Inference of Sparse Frame Differences in Videos☆59Updated 2 years ago
- ☆31Updated 9 months ago
- Deploy RT-EDTR with onnx from paddlepaddle framwork and graph cut☆29Updated last year
- ☆78Updated 3 weeks ago
- Deep Learning tools and applications for NVIDIA AGX platforms.☆210Updated 3 weeks ago
- Datasets, Transforms and Models specific to Computer Vision☆85Updated last year
- Standalone Flash Attention v2 kernel without libtorch dependency☆108Updated 7 months ago
- Sample app code for deploying TAO Toolkit trained models to Triton☆87Updated 7 months ago
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆182Updated last week
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆110Updated this week