DataXujing / TensorRT-LLM-ChatGLM3Links

大模型部署实战：TensorRT-LLM, Triton Inference Server, vLLM

☆26

Alternatives and similar repositories for TensorRT-LLM-ChatGLM3

Users that are interested in TensorRT-LLM-ChatGLM3 are comparing it to the libraries listed below

Sorting:

FeiGeChuanShu / trt2023
NVIDIA TensorRT Hackathon 2023复赛选题：通义千问Qwen-7B用TensorRT-LLM模型搭建及优化
☆43Updated 2 years ago
TRT2022 / ControlNet_TensorRT
天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛初赛第三名方案
☆50Updated 2 years ago
Oldpan / DeployIsAllYouNeed
☆120Updated 2 years ago
TRT2022 / trtllm-llama
☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化
☆51Updated 2 years ago
inisis / OnnxLLM
Large Language Model Onnx Inference Framework
☆36Updated last week
sophgo / ChatGLM2-TPU
run ChatGLM2-6B in BM1684X
☆50Updated last year
torchpipe / torchpipe
Serving Inside Pytorch
☆165Updated 2 weeks ago
richjjj / cuvid-tensorrt-multi
ffmpeg+cuvid+tensorrt+multicamera
☆12Updated 11 months ago
triple-Mu / HunyuanDiT-TensorRT-libtorch
HunyuanDiT with TensorRT and libtorch
☆18Updated last year
Tlntin / trt2023
☆26Updated 2 years ago
sesmfs / onnx_quant_tool
An onnx-based quantitation tool.
☆71Updated last year
wangzhaode / onnx-llm
llm deploy project based onnx.
☆47Updated last year
luchangli03 / onnxsim_large_model
simplify >2GB large onnx model
☆69Updated last year
triple-Mu / TensorRT2ONNX
A tool convert TensorRT engine/plan to a fake onnx
☆41Updated 3 years ago
TRT2022 / MST-plus-plus-TensorRT
TensorRT 2022复赛方案：首个基于Transformer的图像重建模型MST++的TensorRT模型推断优化
☆143Updated 3 years ago
ozanarmagan / clip_tokenizer_cpp
☆10Updated last year
yuxiaoranyu / stable_diffusion_trt_triton
☆20Updated last year
tsingmicro-toolchain / OnnxSlim
A Toolkit to Help Optimize Large Onnx Model
☆162Updated last month
Tlntin / qwen-ascend-llm
☆52Updated last year
shouxieai / diffusion_from02hero
README.md
☆48Updated 2 years ago
caibucai22 / awesome-cuda
Awesome code, projects, books, etc. related to CUDA
☆26Updated 3 months ago
bug-developer021 / YOLOV5_optimization_on_triton
Compare multiple optimization methods on triton to imporve model service performance
☆52Updated last year
Tlntin / ChatGLM2-6B-TensorRT
☆90Updated 2 years ago
yvonwin / qwen2.cpp
qwen2 and llama3 cpp implementation
☆48Updated last year
wangzhaode / mnn-stable-diffusion
stable diffusion using mnn
☆67Updated 2 years ago
Oneflow-Inc / oneflow-yolo-doc
https://start.oneflow.org/oneflow-yolo-doc
☆22Updated 2 years ago
Qingrenn / mmdeploy-summer-camp
🐱 ncnn int8 模型量化评估
☆14Updated 3 years ago
MegEngine / examples
A set of examples around MegEngine
☆31Updated 2 years ago
MegEngine / mgeconvert
MegEngine到其他框架的转换器
☆70Updated 2 years ago
ZHEQIUSHUI / CLIP-ONNX-AX650-CPP
c++实现的clip推理，模型有一点点改动，但是不大，改动和导出模型的代码可以在readme里找到，模型文件都在Releases里，包括AX650的模型。新增支持ChineseCLIP
☆30Updated 5 months ago