leimao/TensorRT-Custom-Plugin-Example

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/leimao/TensorRT-Custom-Plugin-Example)

leimao / TensorRT-Custom-Plugin-Example

Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration

☆88

Alternatives and similar repositories for TensorRT-Custom-Plugin-Example

Users that are interested in TensorRT-Custom-Plugin-Example are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lix19937 / tensorrt-insight
View on GitHub
Deep insight tensorrt, including but not limited to qat, ptq, plugin, triton_inference, cuda
☆24Jul 2, 2026Updated 2 weeks ago
wangzyon / trt_learn
View on GitHub
TensorRT encapsulation, learn, rewrite, practice.
☆30Oct 19, 2022Updated 3 years ago
Bruce-Lee-LY / decoding_attention
View on GitHub
Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.
☆47Jun 11, 2025Updated last year
Phoenix8215 / learn-TensorRT-from-scratch
View on GitHub
learn TensorRT from scratch🥰
☆18Sep 29, 2024Updated last year
raymond1123 / hgemm
View on GitHub
☆30Nov 16, 2024Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
triple-mu / HunyuanDiT-TensorRT-libtorch
View on GitHub
HunyuanDiT with TensorRT and libtorch
☆18May 22, 2024Updated 2 years ago
DataXujing / DeepSeek-R1-Android
View on GitHub
安卓手机部署DeepSeek-R1 蒸馏的1.5B模型
☆24Feb 4, 2025Updated last year
ZHEQIUSHUI / CLIP-ONNX-AX650-CPP
View on GitHub
c++实现的clip推理，模型有一点点改动，但是不大，改动和导出模型的代码可以在readme里找到，模型文件都在Releases里，包括AX650的模型。新增支持ChineseCLIP
☆31Jun 19, 2025Updated last year
kalfazed / multi-thread-programming
View on GitHub
This is a repository to practice multi-thread programming in C++
☆31Feb 21, 2024Updated 2 years ago
Guanbin-Huang / camera_calibration_cpp
View on GitHub
☆19Aug 23, 2022Updated 3 years ago
Bruce-Lee-LY / cutlass_gemm
View on GitHub
Multiple GEMM operators are constructed with cutlass to support LLM inference.
☆20Aug 3, 2025Updated 11 months ago
morsoli / llmbenchmark
View on GitHub
大模型API性能指标比较 - 深入分析TTFT、TPS等关键指标
☆20Sep 12, 2024Updated last year
leimao / CUTLASS-Examples
View on GitHub
CUTLASS and CuTe Examples
☆136Nov 30, 2025Updated 7 months ago
DD-DuDa / TensorRT-in-Action
View on GitHub
TensorRT-in-Action 是一个 GitHub 代码库，提供了使用 TensorRT 的代码示例，并有对应 Jupyter Notebook。
☆15Jun 1, 2023Updated 3 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
caibucai22 / awesome-cuda
View on GitHub
Awesome code, projects, books, etc. related to CUDA
☆38Jun 2, 2026Updated last month
Susan19900316 / yolov5_tensorrt_int8
View on GitHub
yolov5 tensorrt int8量化方法汇总
☆84Dec 12, 2023Updated 2 years ago
DataXujing / YOLOv12-TensorRT
View on GitHub
YOLOv12 TensorRT 端到端模型加速推理和INT8量化实现
☆14Mar 5, 2025Updated last year
caiwanxianhust / FasterLLaMA
View on GitHub
使用 CUDA C++ 实现的 llama 模型推理框架
☆64Nov 8, 2024Updated last year
kalfazed / tensorrt_starter
View on GitHub
This repository give a guidline to learn CUDA and TensorRT from the beginning.
☆360Jun 14, 2026Updated last month
Bruce-Lee-LY / cuda_hgemv
View on GitHub
Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
☆75Sep 8, 2024Updated last year
meta-pytorch / tokenizers
View on GitHub
C++ implementations for various tokenizers (sentencepiece, tiktoken etc).
☆50Updated this week
Phoenix8215 / learn-ONNX-from-scratch
View on GitHub
一大波学习onnx的案例
☆27Sep 20, 2024Updated last year
ozanarmagan / clip_tokenizer_cpp
View on GitHub
☆10Jul 18, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
cqu20160901 / DETR_onnx_tensorRT_V2
View on GitHub
DETR tensor去除推理过程无用辅助头+fp16部署再次加速+解决转tensorrt 输出全为0问题的新方法。
☆12Jan 9, 2024Updated 2 years ago
JYS997760473 / CenterPoint-ROS-Detection-and-Tracking
View on GitHub
Detection and Tracking ROS node based on CenterPoint and Kalman Filter
☆24Feb 24, 2024Updated 2 years ago
liwuhen / CVDeploy-2D
View on GitHub
The repository supports TensorRT, QNN platform inference, 2D obstacle detection yolo series (yolov5, yolov8, yolo11, yolox), semantic seg…
☆20May 6, 2025Updated last year
NVIDIA-AI-IOT / tensorrt_plugin_generator
View on GitHub
A simple tool that can generate TensorRT plugin code quickly.
☆243Jul 11, 2023Updated 3 years ago
inisis / OnnxLLM
View on GitHub
Large Language Model Onnx Inference Framework
☆35Nov 25, 2025Updated 7 months ago
wangzhaode / onnx-llm
View on GitHub
llm deploy project based onnx.
☆49Oct 9, 2024Updated last year
huangleiBuaa / CenteredWN
View on GitHub
This project is the Torch implementation of our ICCV 2017 paper: Centered Weight Normalization in Accelerating Training of Deep Neural…
☆21Dec 7, 2019Updated 6 years ago
richjjj / cuvid-tensorrt-multi
View on GitHub
ffmpeg+cuvid+tensorrt+multicamera
☆12Dec 31, 2024Updated last year
zpye / SimpleInfer
View on GitHub
A simple neural network inference framework
☆25Aug 1, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
raymond1123 / Flash-Attention
View on GitHub
☆26Nov 21, 2024Updated last year
leon0514 / trt-sahi-yolo
View on GitHub
Accelerating SAHI-based inference on YOLO models using TensorRT.
☆103Jan 6, 2026Updated 6 months ago
autowarefoundation / spconv_cpp
View on GitHub
☆21May 26, 2026Updated last month
cqu20160901 / FastSAM_rknn_Cplusplus
View on GitHub
FastSAM 部署rknn C++ 代码
☆13May 30, 2024Updated 2 years ago
Huntersdeng / CXX-DeepLearning-Inference
View on GitHub
A unified and extensible pipeline for deep learning model inference with C++. Now support yolov8, yolov9, clip, and nanosam. More models …
☆12Aug 3, 2025Updated 11 months ago
HeKun-NVIDIA / TensorRT-Developer_Guide_in_Chinese
View on GitHub
☆321May 11, 2022Updated 4 years ago
levipereira / yolov9-qat
View on GitHub
Implementation of YOLOv9 QAT optimized for deployment on TensorRT platforms.
☆139Apr 24, 2025Updated last year