Phoenix8215 / A-White-Paper-on-Neural-Network-Deployment
模型部署白皮书(CUDA|ONNX|TensorRT|C++)🚀🚀🚀
☆185Updated last month
Related projects ⓘ
Alternatives and complementary repositories for A-White-Paper-on-Neural-Network-Deployment
- 【grps接入trtllm】通过接入TensorRT-LLM以及Tokenizers.cpp实现纯c++版本高性能LLM服务,兼容OpenAI接口协议,支持chat和function call模式,支持ai agent,支持分布式多卡推理,支持多模态,支持gradio聊天界…☆87Updated last week
- Ai edge toolbox,专门面向边端设备尤其是嵌入式RTOS平台,AI模型部署工具链,包括模型推理引擎和模型压缩工具☆167Updated 10 months ago
- 【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式…☆156Updated last week
- A repo that uses TensorRT to deploy wll-trained models.Support RTDETR,YOLO-NAS,YOLOV5,YOLOV6,YOLOV7,YOLOV8,YOLOX.☆110Updated last year
- 校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。☆220Updated this week
- Build CUDA Neural Network From Scratch☆17Updated 2 months ago
- ☆220Updated last month
- 更友好的nanoGPT的中文教程☆97Updated 5 months ago
- b站上的课程☆70Updated last year
- 高性能计算课程&CUDA编程实例&深度学习推理框架☆29Updated last year
- This repository give a guidline to learn CUDA and TensorRT from the beginning.☆148Updated 4 months ago
- 《CUDA编程基础与实践》一书的代码☆93Updated 2 years ago
- 模型压缩的小白入门教程☆184Updated this week
- learning how CUDA works☆162Updated 2 months ago
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆224Updated last month
- A CUDA tutorial to make people learn CUDA program from 0☆195Updated 4 months ago
- 🚀 Do not need libtorch, pure C++ TensorRT deploys SOLOv2 etc, which can be quickly ported to NX/TX2.☆50Updated 2 years ago
- A large number of cuda/tensorrt cases . 大量案例来学习cuda/tensorrt☆105Updated 2 years ago
- SegmentAnything-OnnxRunner is an example using Meta AI Research's SAM onnx model in C++.The encoder and decoder of SAM are decoupled in t…☆98Updated 11 months ago
- 使用deepspeed从头开始训练一个LLM,经过pretrain和sft阶段,验证llm学习知识、理解语言、回答问题的能力☆154Updated 3 months ago
- LLM notes, including model inference, transformer model structure, and lightllm framework code analysis notes☆27Updated this week
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆76Updated last week
- QAT(quantize aware training) for classification with MQBench☆35Updated 2 years ago
- An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic…☆144Updated 3 years ago
- YOLOv5 pruning on COCO Dataset☆89Updated last year
- 该代码与B站上的视频 https://www.bilibili.com/video/BV18L41197Uz/?spm_id_from=333.788&vd_source=eefa4b6e337f16d87d87c2c357db8ca7 相关联。☆59Updated last year
- ☆225Updated 2 years ago
- 目标检测,采用yolov8作为基准模型,数据集采用VisDrone2019,带有自己的改进策略☆49Updated 3 months ago
- ☆52Updated last year