NVIDIA / tao_deploy
Package for deploying deep learning models from TAO Toolkit
☆17Updated 5 months ago
Alternatives and similar repositories for tao_deploy:
Users that are interested in tao_deploy are comparing it to the libraries listed below
- TAO Toolkit deep learning networks with TensorFlow 1.x backend☆13Updated last year
- Quick start scripts and tutorial notebooks to get started with TAO Toolkit☆66Updated 5 months ago
- ☆66Updated 2 years ago
- ☆59Updated 3 months ago
- TAO Toolkit deep learning networks with PyTorch backend☆91Updated 3 months ago
- Datasets, Transforms and Models specific to Computer Vision☆84Updated last year
- NVIDIA DLA-SW, the recipes and tools for running deep learning workloads on NVIDIA DLA cores for inference applications.☆187Updated 8 months ago
- ☆92Updated 5 months ago
- ☆31Updated 7 months ago
- CLIP and SigLIP models optimized with TensorRT with a Transformers-like API☆21Updated 4 months ago
- 📚FFPA: Yet antother Faster Flash Prefill Attention with O(1)⚡️SRAM complexity for headdim > 256, 1.8x~3x↑🎉faster than SDPA EA.☆96Updated this week
- Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration☆51Updated 8 months ago
- ☆65Updated last week
- ☆59Updated 7 months ago
- DeltaCNN End-to-End CNN Inference of Sparse Frame Differences in Videos☆60Updated last year
- This library empowers users to seamlessly port pretrained models and checkpoints on the HuggingFace (HF) hub (developed using HF transfor…☆57Updated this week
- A sparse attention kernel supporting mix sparse patterns☆108Updated this week
- llama INT4 cuda inference with AWQ☆50Updated 3 weeks ago
- [ICML 2022] "DepthShrinker: A New Compression Paradigm Towards Boosting Real-Hardware Efficiency of Compact Neural Networks", by Yonggan …☆71Updated 2 years ago
- ROS package for SOTA Computer Vision Models including SAM, Cutie, GroundingDINO, YOLO-World, VLPart, DEVA and MaskDINO.☆43Updated 6 months ago
- ☆11Updated last year
- Standalone Flash Attention v2 kernel without libtorch dependency☆103Updated 5 months ago
- A tool convert TensorRT engine/plan to a fake onnx☆37Updated 2 years ago
- VIT inference in triton because, why not?☆22Updated 8 months ago
- Profile PyTorch models for FLOPs and parameters, helping to evaluate computational efficiency and memory usage.☆29Updated 3 weeks ago
- A collection of reference AI microservices and workflows for Jetson Platform Services☆34Updated 2 weeks ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆18Updated last week
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆19Updated last year
- A set of examples around MegEngine☆31Updated last year
- 使用 CUDA C++ 实现的 llama 模型推理框架☆44Updated 3 months ago