Franc-Z / QWen1.5_TensorRT-LLMLinks

Optimize QWen1.5 models with TensorRT-LLM

☆17

Alternatives and similar repositories for QWen1.5_TensorRT-LLM

Users that are interested in QWen1.5_TensorRT-LLM are comparing it to the libraries listed below

Sorting:

Tlntin / Qwen-TensorRT-LLM
☆626Updated last year
Tlntin / ChatGLM2-6B-TensorRT
☆90Updated 2 years ago
alibaba / rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
☆926Updated this week
wangzhaode / llm-export
llm-export can export llm model to onnx.
☆328Updated 3 weeks ago
luchangli03 / export_llama_to_onnx
export llama to onnx
☆136Updated 10 months ago
zhaohb / fastapi_tritonserver
☆27Updated last year
Tencent / KsanaLLM
☆512Updated 2 months ago
mindspore-lab / mindformers
☆177Updated this week
alibaba / Megatron-LLaMA
Best practice for training LLaMA models in Megatron-LM
☆659Updated last year
modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆268Updated 3 months ago
void-main / FasterTransformer
Transformer related optimization, including BERT, GPT
☆59Updated 2 years ago
YellowOldOdd / SDBI
Simple Dynamic Batching Inference
☆145Updated 3 years ago
alibaba / Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
☆1,436Updated last week
bug-developer021 / YOLOV5_optimization_on_triton
Compare multiple optimization methods on triton to imporve model service performance
☆52Updated last year
sunkx109 / llama
Inference code for LLaMA models
☆127Updated 2 years ago
triton-inference-server / tensorrtllm_backend
The Triton TensorRT-LLM Backend
☆906Updated last week
PaddlePaddle / PaddleCustomDevice
PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
☆98Updated last week
owenliang / qwen-vllm
通义千问VLLM推理部署DEMO
☆620Updated last year
tpoisonooo / llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
☆367Updated 2 years ago
flagos-ai / FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
☆407Updated last week
layerism / TensorRT-Inference-Server-Tutorial
服务侧深度学习部署案例
☆454Updated 5 years ago
THUDM / FasterTransformer
Transformer related optimization, including BERT, GPT
☆39Updated 2 years ago
ModelTC / LightCompress
[EMNLP 2024 & AAAI 2026] A powerful toolkit for compressing large models including LLM, VLM, and video generation models.
☆622Updated last week
yangjianxin1 / Firefly-LLaMA2-Chinese
Firefly中文LLaMA-2大模型，支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型
☆413Updated 2 years ago
OpenPPL / ppl.llm.serving
☆130Updated 10 months ago
alipay / PainlessInferenceAcceleration
Accelerate inference without tears
☆367Updated last month
void-main / fastertransformer_backend
☆22Updated 2 years ago
DeepLink-org / dlinfer
☆65Updated last week
ninehills / llm-inference-benchmark
LLM Inference benchmark
☆430Updated last year
bytedance / ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
☆478Updated last year