zzk0 / triton
Triton Inferece Server Model Config and Client Scripts
☆31Updated 2 years ago
Related projects: ⓘ
- Compare multiple optimization methods on triton to imporve model service performance☆46Updated 8 months ago
- 将Yolov3模型转成可以进行动态Batch的TensorRT推理以及Triton Inference Serving上部署的TensorRT模型☆27Updated 3 years ago
- ☆65Updated last year
- ☆32Updated 7 months ago
- ☆23Updated last year
- async inference for machine learning model☆27Updated 2 years ago
- ☆95Updated 3 years ago
- ☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化☆40Updated 11 months ago
- ☆56Updated this week
- ☆90Updated last year
- 天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛 初赛第三名方案☆47Updated last year
- Serving Inside Pytorch☆141Updated last week
- Trans different platform's network to International Representation(IR)☆44Updated 6 years ago
- Möbius Transformation for Fast Inner Product Search on Graph☆23Updated 3 years ago
- 高效部署:YOLO X, V3, V4, V5, V6, V7, V8, EdgeYOLO TRT推理 ™️ ,前后处理均由CUDA核函数实现 CPP/CUDA🚀☆46Updated last year
- triton server ensemble model demo☆30Updated 2 years ago
- deploy onnx models with TensorRT and LibTorch☆16Updated 2 years ago
- autoTVM神经网络推理代码优化搜索演示,基于tvm编译开源模型centerface,并使用autoTVM搜索最优推理代码, 最终部署编译为c++代码,演示平台是cuda,可以是其他平台,例如树莓派,安卓手机,苹果手机.Thi is a demonstration of …☆27Updated 3 years ago
- Using TensorRT for Inference Model Deployment.☆46Updated 8 months ago
- Transformer related optimization, including BERT, GPT☆17Updated last year
- 彻底弄懂BP反向传播,15行代码,C++实现也简单,MNIST分类98.29%精度☆33Updated 2 years ago
- cpp project template based on visual studio, OpenCV and CUDA, gdb debug, makefile☆26Updated 2 years ago
- shouxie_RNN☆34Updated 2 years ago
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆39Updated 11 months ago
- TensorRT encapsulation, learn, rewrite, practice.☆22Updated last year
- The DeepSpark open platform selects hundreds of open source application algorithms and models that are deeply coupled with industrial app…☆26Updated this week
- ☆16Updated this week
- ☆115Updated last year
- ☆26Updated last year
- 大模型部署实战:TensorRT-LLM, Triton Inference Server, vLLM☆25Updated 6 months ago