NetEase-Media / grps
【深度学习模型部署框架】支持tf/torch/trt/trtllm/vllm以及更多nn框架,支持dynamic batching、streaming模式,支持python/c++双语言,可限制,可拓展,高性能。帮助用户快速地将模型部署到线上,并通过http/rpc接口方式提供服务。
☆165Updated last week
Related projects ⓘ
Alternatives and complementary repositories for grps
- 【grps接入trtllm】通过GPRS+TensorRT-LLM+Tokenizers.cpp实现纯C++版高性能OpenAI LLM服务,支持chat和function call模式,支持ai agent,支持分布式多卡推理,支持多模态,支持gradio聊天界面。☆92Updated 2 weeks ago
- TengineGst is a streaming media analytics framework, based on GStreamer multimedia framework, for creating varied complex media analytics…☆74Updated 3 years ago
- 使用deepspeed从头开始训练一个LLM,经过pretrain和sft阶段,验证llm学习知识、理解语言、回答问题的能力☆155Updated 4 months ago
- 🚀 Do not need libtorch, pure C++ TensorRT deploys SOLOv2 etc, which can be quickly ported to NX/TX2.☆50Updated 2 years ago
- An acceleration library that supports arbitrary bit-width combinatorial quantization operations☆226Updated last month
- Algorithm acceleration landing framework, let you complete the development of algorithm at low cost.eg: Facedetect, FaceLandmark..☆90Updated 3 years ago
- Ai edge toolbox,专门面向边端设备尤其是嵌入式RTOS平台,AI模型部署工具链,包括模型推理引擎和模型压缩工具☆169Updated 11 months ago
- Support mixed-precsion inference with vllm☆95Updated 2 weeks ago
- MIXQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction☆81Updated 3 weeks ago
- SegmentAnything-OnnxRunner is an example using Meta AI Research's SAM onnx model in C++.The encoder and decoder of SAM are decoupled in t…☆100Updated 11 months ago
- 模型部署白皮书(CUDA|ONNX|TensorRT|C++)🚀🚀🚀☆185Updated 2 months ago
- This tool(enhance_long) aims to enhance the LlaMa2 long context extrapolation capability in the lowest-cost approach, preferably without …☆47Updated 11 months ago
- Build CUDA Neural Network From Scratch☆19Updated 2 months ago
- Mixed precision inference by Tensorrt-LLM☆93Updated 3 weeks ago
- This is a repo for my NanoGPT Pytorch2.0 Implementation when torch2.0 released soon, faster and simpler, a good tutorial learning GPT.☆60Updated 9 months ago
- Inference of superpoint feature extraction with pure C/C++☆34Updated 8 months ago
- An ultra fast tiny model for lane detection, using onnx_parser, TensorRTAPI, torch2trt to accelerate. our model support for int8, dynamic…☆143Updated 3 years ago
- 本项目使用YOLOv4模型,并在对数字信号灯进行数字识别时采用opencv算法。☆124Updated last year
- 保险行业回访外呼机器人☆74Updated last year
- An open source task scheduling library ASRT (Async Runtime) written in modern C++ tailored for embedded linux systems.☆123Updated 2 months ago
- aot compiler☆116Updated 7 months ago
- 本项目旨在结合以往研究人员的代表性工作,从多个维度评估sft数据,并自动化过滤sft数据。☆55Updated 8 months ago
- 教你只用最基本的python语法和numpy一步步实现深度学习框架☆120Updated 3 months ago
- High performance rank executor for advertisement and recommendation system, implemented in C/C++ and support ensembled into Java/Scala ho…☆99Updated 8 months ago
- Code and data for crosstalk text generation tasks, exploring whether large models and pre-trained language models can understand humor. …☆166Updated 2 years ago
- GAL-DAWN: An Novel High performance computing Library of Graph Algorithms based on DAWN, CUDA/C++☆115Updated 3 months ago
- Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models☆129Updated 2 weeks ago
- Aiming to build the most comprehensive machine learning blog.☆153Updated this week