TensorRT-in-Action 是一个 GitHub 代码库,提供了使用 TensorRT 的代码示例,并有对应 Jupyter Notebook。
☆15Jun 1, 2023Updated 2 years ago
Alternatives and similar repositories for TensorRT-in-Action
Users that are interested in TensorRT-in-Action are comparing it to the libraries listed below
Sorting:
- Several optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.☆73Sep 8, 2024Updated last year
- Flash Attention in ~100 lines of CUDA (forward pass only)☆11Jun 10, 2024Updated last year
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- Tutorials of Extending and importing TVM with CMAKE Include dependency.☆16Oct 11, 2024Updated last year
- learn TensorRT from scratch🥰☆18Sep 29, 2024Updated last year
- Multiple GEMM operators are constructed with cutlass to support LLM inference.☆20Aug 3, 2025Updated 7 months ago
- Awesome code, projects, books, etc. related to CUDA☆31Feb 3, 2026Updated last month
- [HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆81Dec 18, 2025Updated 2 months ago
- 车道线检测Lanenet TensorRT加速C++实现☆23Feb 24, 2022Updated 4 years ago
- ☆27Aug 5, 2022Updated 3 years ago
- LLM Inference with Microscaling Format☆34Nov 12, 2024Updated last year
- C++ implementations for various tokenizers (sentencepiece, tiktoken etc).☆49Feb 23, 2026Updated last week
- Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration☆82May 26, 2025Updated 9 months ago
- Live demo of hls4ml on embedded platforms such as the Pynq-Z2☆12Aug 23, 2024Updated last year
- This project is intended to build and deploy an SNPE model on Qualcomm Devices, which are having unsupported layers which are not part of…☆10Oct 4, 2021Updated 4 years ago
- Based on the mHC architecture proposed by deepseek, the residual links of the existing iTransformer are replaced and updated to obtain a …☆28Feb 4, 2026Updated last month
- Examples of CUDA implementations by Cutlass CuTe☆269Jul 1, 2025Updated 8 months ago
- Implement Flash Attention using Cute.☆102Dec 17, 2024Updated last year
- flash attention 优化日志☆26Jun 4, 2025Updated 9 months ago
- 使用ONNXRuntime部署一种用于边缘检测的轻量级密集卷积神经网络LDC,包含C++和Python两个版本的程序☆11Apr 24, 2023Updated 2 years ago
- ☆116May 16, 2025Updated 9 months ago
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆46Jun 11, 2025Updated 8 months ago
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆58Aug 12, 2024Updated last year
- List of papers related to Vision Transformers quantization and hardware acceleration in recent AI conferences and journals.☆102Jun 2, 2024Updated last year
- 重构nerf代码,更加容易读懂☆13Mar 26, 2023Updated 2 years ago
- 关于算法处理实时视频流性能不足使用并行处理的方案和优化(APP层面)。☆11Jun 5, 2021Updated 4 years ago
- FastSAM 部署rknn C++ 代码☆14May 30, 2024Updated last year
- CenterNet3D 部署版本,便于移植不同平台(onnx、tensorRT、rknn、Horizon)。☆13May 24, 2024Updated last year
- Persistent dense gemm for Hopper in `CuTeDSL`☆15Aug 9, 2025Updated 6 months ago
- Perceptron-based branch predictor written in C++☆12Dec 14, 2016Updated 9 years ago
- MobileSAM のエンコーダー/デコーダーをONNXに変換し、推論するサンプル☆12Apr 11, 2024Updated last year
- ☆11Mar 24, 2023Updated 2 years ago
- ☆10Jul 18, 2024Updated last year
- Created a simple neural network using C++17 standard and the Eigen library that supports both forward and backward propagation.☆10Jul 27, 2024Updated last year
- Try to export the ONNX QDQ model that conforms to the AXERA NPU quantization specification. Currently, only w8a8 is supported.☆11Sep 10, 2024Updated last year
- trt-hackathon-2022 三等奖方案☆10Mar 6, 2023Updated 2 years ago
- Apply Graph Neural Networks to Optimize Factor Feature Extraction of FactorVAE☆13Jan 11, 2025Updated last year
- Inference deployment of the llama3☆11Apr 21, 2024Updated last year
- ☆10Jun 28, 2019Updated 6 years ago