关于自建AI推理引擎的手册,从0开始你需要知道的所有事情
☆274Sep 8, 2022Updated 3 years ago
Alternatives and similar repositories for AI-Infer-Engine-From-Zero
Users that are interested in AI-Infer-Engine-From-Zero are comparing it to the libraries listed below
Sorting:
- MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器☆486Oct 23, 2024Updated last year
- Make a minimal OpenCV runable on any where, WIP☆87Jan 16, 2023Updated 3 years ago
- ffmpeg+cuvid+tensorrt+multicamera☆12Dec 31, 2024Updated last year
- SGEMM optimization with cuda step by step☆21Mar 23, 2024Updated last year
- 《Machine Learning Systems: Design and Implementation》- Chinese Version☆4,764Apr 13, 2024Updated last year
- 分层解耦的深度学习推理引擎☆79Feb 17, 2025Updated last year
- 一款简单易用和高性能的AI部署框架 | An Easy-to-Use and High-Performance AI Deployment Framework☆1,751Feb 23, 2026Updated last week
- 校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library st…☆3,331Jun 22, 2025Updated 8 months ago
- A simple neural network inference framework☆25Aug 1, 2023Updated 2 years ago
- how to optimize some algorithm in cuda.☆2,825Feb 15, 2026Updated 2 weeks ago
- ⚡️ Using NNIE as simple as using ncnn ⚡️☆184Jan 26, 2022Updated 4 years ago
- 跟着Tensorrt_pro学习各种知识☆40Nov 25, 2022Updated 3 years ago
- ☆10Jul 18, 2024Updated last year
- Various test models in WNNX format. It can view with `pip install wnetron && wnetron`☆12Jun 22, 2022Updated 3 years ago
- row-major matmul optimization☆703Feb 24, 2026Updated last week
- bilibili视频【CUDA 12.x 并行编程入门(C++版)】配套代码☆34Aug 12, 2024Updated last year
- FastSAM 部署版本,便于移植不同平,部署简单、运行速度快。☆24May 30, 2024Updated last year
- ncnn和pnnx格式编辑器☆137Oct 7, 2024Updated last year
- PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.☆1,785Mar 28, 2024Updated last year
- A library for high performance deep learning inference on NVIDIA GPUs.☆555Jan 29, 2022Updated 4 years ago
- 🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉☆4,374Feb 25, 2026Updated last week
- Tutorials for writing high-performance GPU operators in AI frameworks.☆135Aug 12, 2023Updated 2 years ago
- ☆42Jun 25, 2020Updated 5 years ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆79Aug 12, 2024Updated last year
- A primitive library for neural network☆1,366Nov 24, 2024Updated last year
- YoloV8 segmentation NPU for the RK 3566/68/88☆18Apr 30, 2024Updated last year
- ggml学习笔记,ggml是一个机器学习的推理框架☆18Mar 24, 2024Updated last year
- ncnn 实现一些项目例子☆26Feb 17, 2023Updated 3 years ago
- ☆28Jun 30, 2025Updated 8 months ago
- Example of SenseCraft Model Assistant Model deployment related to ESP32☆32Apr 9, 2025Updated 10 months ago
- YOLOv3、YOLOv4、YOLOv5、YOLOv5-Lite、YOLOv6-v1、YOLOv6-v2、YOLOv7、YOLOX、YOLOX-Lite、PP-YOLOE、PP-PicoDet-Plus、YOLO-Fastest v2、FastestDet、YOLOv5-S…☆765Oct 25, 2022Updated 3 years ago
- Quantize yolov5 using pytorch_quantization.🚀🚀🚀☆14Oct 24, 2023Updated 2 years ago
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆251Mar 15, 2024Updated last year
- Awesome lists about all kinds of awesome skills to help you go out of 35 crisis, and most important, to tell you how to enjoy your life.☆18Jul 9, 2022Updated 3 years ago
- profile tools for pytorch nn models☆42Jan 11, 2021Updated 5 years ago
- NVIDIA TensorRT Hackathon 2023复赛选题:通义千问Qwen-7B用TensorRT-LLM模型搭建及优化☆43Oct 20, 2023Updated 2 years ago
- ☆67Oct 25, 2025Updated 4 months ago
- ☆26Apr 21, 2021Updated 4 years ago
- Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.☆44Feb 27, 2025Updated last year