NVIDIA / tao_deploy
Package for deploying deep learning models from TAO Toolkit
☆19Updated 6 months ago
Alternatives and similar repositories for tao_deploy:
Users that are interested in tao_deploy are comparing it to the libraries listed below
- Quick start scripts and tutorial notebooks to get started with TAO Toolkit☆74Updated 6 months ago
- TAO Toolkit deep learning networks with PyTorch backend☆91Updated 4 months ago
- TAO Toolkit deep learning networks with TensorFlow 1.x backend☆13Updated last year
- ☆61Updated 4 months ago
- NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.☆190Updated 9 months ago
- ☆93Updated 6 months ago
- A collection of reference AI microservices and workflows for Jetson Platform Services☆37Updated last month
- Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function ind…☆88Updated 11 months ago
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆49Updated 9 months ago
- ☆59Updated 8 months ago
- A tutorial introducing knowledge distillation as an optimization technique for deployment on NVIDIA Jetson☆179Updated last year
- Standalone Flash Attention v2 kernel without libtorch dependency☆106Updated 6 months ago
- A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space☆76Updated 2 months ago
- ☆66Updated 2 years ago
- ☆32Updated last year
- ☆31Updated 9 months ago
- Optimized local inference for LLMs with HuggingFace-like APIs for quantization, vision/language models, multimodal agents, speech, vector…☆249Updated 5 months ago
- An Android Application for GLCC☆11Updated 2 years ago
- ☆101Updated this week
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆35Updated last week
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆60Updated this week
- Training LLaMA language model with MMEngine! It supports LoRA fine-tuning!☆40Updated last year
- A tool convert TensorRT engine/plan to a fake onnx☆38Updated 2 years ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆59Updated 2 weeks ago
- YOLOv5 on Orin DLA☆191Updated last year
- CLIP and SigLIP models optimized with TensorRT with a Transformers-like API☆22Updated 5 months ago
- ☆42Updated 2 months ago