morsoli / llmbenchmarkLinks

大模型API性能指标比较 - 深入分析TTFT、TPS等关键指标

☆18

Alternatives and similar repositories for llmbenchmark

Users that are interested in llmbenchmark are comparing it to the libraries listed below

Sorting:

FeiGeChuanShu / trt2023
NVIDIA TensorRT Hackathon 2023复赛选题：通义千问Qwen-7B用TensorRT-LLM模型搭建及优化
☆42Updated last year
AXERA-TECH / CLIP-ONNX-AX650-CPP
☆27Updated 2 weeks ago
EdVince / whisper-trtllm
Whisper in TensorRT-LLM
☆16Updated last year
DataXujing / Bert_TensorRT
Bert TensorRT模型加速部署
☆9Updated 3 years ago
yvonwin / qwen2.cpp
qwen2 and llama3 cpp implementation
☆45Updated last year
DataXujing / DeepSeek-R1-Android
安卓手机部署DeepSeek-R1 蒸馏的1.5B模型
☆22Updated 5 months ago
TRT2022 / ControlNet_TensorRT
天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛初赛第三名方案
☆49Updated last year
sophgo / ChatGLM2-TPU
run ChatGLM2-6B in BM1684X
☆49Updated last year
lrw04 / tinyllamas-ncnn
Inference TinyLlama models on ncnn
☆24Updated last year
DataXujing / YOLOv12-TensorRT
YOLOv12 TensorRT 端到端模型加速推理和INT8量化实现
☆13Updated 4 months ago
Xiaolong-RRL / qwen2_5_vllm_fastapi
使用FastAPI+vLLM部署Qwen2.5
☆21Updated 9 months ago
isLinXu / paper-read-notes
paper-read-notes
☆12Updated 9 months ago
TRT2022 / trtllm-llama
☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化
☆49Updated last year
ibaiGorordo / ONNX-YOLO-World-Open-Vocabulary-Object-Detection
Python scripts performing Open Vocabulary Object Detection using the YOLO-World model in ONNX.
☆55Updated last year
reilxlx / llava-Qwen2-7B-Instruct-Chinese-CLIP
模型 llava-Qwen2-7B-Instruct-Chinese-CLIP 增强中文文字识别能力和表情包内涵识别能力，接近gpt4o、claude-3.5-sonnet的识别水平！
☆23Updated 11 months ago
guoguo1314 / llama3_learn.c
Inference deployment of the llama3
☆11Updated last year
triple-Mu / HunyuanDiT-TensorRT-libtorch
HunyuanDiT with TensorRT and libtorch
☆17Updated last year
isLinXu / DatasetMarkerTool
🔨🔨🔨Tool for making model training data set
☆19Updated 8 months ago
lin-honghui / data-competition-calendar
国内外数据竞赛资讯整理
☆18Updated 3 years ago
owenliang / DeepSeek-Distill-Qwen-For-Child
☆44Updated 4 months ago
SkyworkAI / vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆16Updated last year
lucasjinreal / wanwu_release
Wanwu models release, code will be released soon
☆24Updated 2 years ago
sherlockchou86 / PyLangPipe
a simple lightweight large language model pipeline framework.
☆25Updated 2 months ago
hpc203 / Real-Time-Frame-Interpolation-onnxrun
使用onnxruntime部署实时视频帧插值，包含C++和Python两个版本的程序
☆25Updated last year
lovemefan / ggml-learning-notes
ggml学习笔记，ggml是一个机器学习的推理框架
☆17Updated last year
richjjj / cuvid-tensorrt-multi
ffmpeg+cuvid+tensorrt+multicamera
☆12Updated 6 months ago
dusty-nv / NanoDB
Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP
☆59Updated 2 months ago
isLinXu / vision-process-webui
💡💡💡awesome compute vision app in gradio
☆53Updated last year
DataXujing / Qwen1.5-0.5b-chat-android
基于MNN-llm的安卓手机部署大语言模型：Qwen1.5-0.5B-Chat
☆80Updated last year
1694439208 / GOT-OCR-Inference
研究GOT-OCR-项目落地加速，不限语言
☆60Updated 8 months ago