wangzyon / trt_learnLinks

TensorRT encapsulation, learn, rewrite, practice.

☆29

Alternatives and similar repositories for trt_learn

Users that are interested in trt_learn are comparing it to the libraries listed below

Sorting:

caiwanxianhust / FasterLLaMA
使用 CUDA C++ 实现的 llama 模型推理框架
☆62Updated last year
raymond1123 / hgemm
☆30Updated last year
Tlntin / trt2023
☆26Updated 2 years ago
HuPengsheet / EasyNN
EasyNN是一个面向教学而开发的神经网络推理框架，旨在让大家0基础也能自主完成推理框架编写！
☆34Updated last year
OpenPPL / ppl.kernel.cuda
☆38Updated last year
leimao / TensorRT-Custom-Plugin-Example
Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration
☆73Updated 6 months ago
harleyszhang / lite_llama
A light llama-like llm inference framework based on the triton kernel.
☆166Updated 2 months ago
caibucai22 / awesome-cuda
Awesome code, projects, books, etc. related to CUDA
☆26Updated 3 months ago
weishengying / tiny-flash-attention
使用 cutlass 实现 flash-attention 精简版，具有教学意义
☆51Updated last year
zjhellofss / KuiperCourse
b站上的课程
☆79Updated 2 years ago
torchpipe / torchpipe
Serving Inside Pytorch
☆165Updated 2 weeks ago
mrzhuzhe / riven
CPU Memory Compiler and Parallel programing
☆26Updated last year
OscarSavolainen / Quantization-Tutorials
A bunch of coding tutorials for my Youtube videos on Neural Network Quantization.
☆25Updated last year
InfiniTensor / RefactorGraph
分层解耦的深度学习推理引擎
☆76Updated 9 months ago
shouxieai / tensorRT_quantization
该代码与B站上的视频 https://www.bilibili.com/video/BV18L41197Uz/?spm_id_from=333.788&vd_source=eefa4b6e337f16d87d87c2c357db8ca7 相关联。
☆70Updated 2 years ago
wangzyon / pyInfer
async inference for machine learning model
☆26Updated 3 years ago
CalvinXKY / BasicCUDA
A tutorial for CUDA&PyTorch
☆170Updated 10 months ago
emptysoal / cuda-image-preprocess
Speed up image preprocess with cuda when handle image or tensorrt inference
☆79Updated last month
tsingmicro-toolchain / OnnxSlim
A Toolkit to Help Optimize Large Onnx Model
☆162Updated last month
BBuf / how-to-optimize-gemm
☆98Updated 4 years ago
Syencil / Programming_Massively_Parallel_Processors
CUDA 6大并行计算模式代码与笔记
☆61Updated 5 years ago
OpenPPL / ppl.pmx
☆60Updated last year
AyakaGEMM / Hands-on-GEMM
☆144Updated last year
zjhellofss / triton_course
☆39Updated 6 months ago
wangzhaode / onnx-llm
llm deploy project based onnx.
☆47Updated last year
ozanarmagan / clip_tokenizer_cpp
☆10Updated last year
inisis / OnnxLLM
Large Language Model Onnx Inference Framework
☆36Updated last week
sesmfs / onnx_quant_tool
An onnx-based quantitation tool.
☆71Updated last year
Bruce-Lee-LY / flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
☆43Updated 9 months ago
sesmfs / onnx_matcher
Using pattern matcher in onnx model to match and replace subgraphs.
☆81Updated last year