Franc-Z / QWen1.5_TensorRT-LLMLinks
Optimize QWen1.5 models with TensorRT-LLM
☆17Updated last year
Alternatives and similar repositories for QWen1.5_TensorRT-LLM
Users that are interested in QWen1.5_TensorRT-LLM are comparing it to the libraries listed below
Sorting:
- ☆28Updated last year
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆274Updated 5 months ago
- ☆623Updated last year
- ☆90Updated 2 years ago
- ☆520Updated last week
- llm-export can export llm model to onnx.☆340Updated 2 months ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆1,021Updated this week
- ☆181Updated this week
- Simple Dynamic Batching Inference☆145Updated 3 years ago
- export llama to onnx☆137Updated last year
- Transformer related optimization, including BERT, GPT☆59Updated 2 years ago
- Transformer related optimization, including BERT, GPT☆39Updated 2 years ago
- Best practice for training LLaMA models in Megatron-LM☆664Updated 2 years ago
- The Triton TensorRT-LLM Backend☆914Updated this week
- ☆141Updated last year
- ☆27Updated 2 years ago
- ☆130Updated last year
- 服务侧深度学习部署案例☆455Updated 5 years ago
- LLaMa/RWKV onnx models, quantization and testcase☆367Updated 2 years ago
- ☆269Updated last month
- Compare multiple optimization methods on triton to imporve model service performance☆51Updated 2 years ago
- ☆36Updated 2 years ago
- Inference code for LLaMA models☆128Updated 2 years ago
- TensorRT Plugin Autogen Tool☆367Updated 2 years ago
- FlagScale is a large model toolkit based on open-sourced projects.☆466Updated this week
- 通义千问VLLM推理部署DEMO☆638Updated last year
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆476Updated last year
- ☆72Updated last week
- Accelerate inference without tears☆372Updated 2 months ago
- ☆77Updated last year