Tlntin / Qwen-TensorRT-LLMLinks
☆610Updated 10 months ago
Alternatives and similar repositories for Qwen-TensorRT-LLM
Users that are interested in Qwen-TensorRT-LLM are comparing it to the libraries listed below
Sorting:
- Accelerate inference without tears☆315Updated 2 months ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆777Updated 2 weeks ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆253Updated this week
- 通义千问VLLM推理部署DEMO☆580Updated last year
- LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)☆418Updated last year
- ☆166Updated this week
- Best practice for training LLaMA models in Megatron-LM☆654Updated last year
- ☆90Updated last year
- llm-export can export llm model to onnx.☆292Updated 4 months ago
- ☆332Updated 4 months ago
- ☆27Updated 6 months ago
- export llama to onnx☆124Updated 5 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆136Updated 5 months ago
- This is a repository used by individuals to experiment and reproduce the pre-training process of LLM.☆435Updated last month
- LLM Inference benchmark☆419Updated 10 months ago
- C++ implementation of Qwen-LM☆588Updated 5 months ago
- 一个基于HuggingFace开发的大语言模型训练、测试工具。支持各模型的webui、终端预测,低参数量及全参数模型训练(预训练、SFT、RM、PPO、DPO)和融合、量化。☆217Updated last year
- llm deploy project based mnn. This project has merged into MNN.☆1,584Updated 4 months ago
- Optimize QWen1.5 models with TensorRT-LLM☆17Updated last year
- Firefly中文LLaMA-2大模型,支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型☆410Updated last year
- Train a 1B LLM with 1T tokens from scratch by personal☆665Updated last month
- 使用peft库,对chatGLM-6B/chatGLM2-6B实现4bit的QLoRA高效微调,并做lora model和base model的merge及4bit的量化(quantize)。☆360Updated last year
- Community maintained hardware plugin for vLLM on Ascend☆703Updated this week
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆476Updated this week
- ☆49Updated this week
- FlagScale is a large model toolkit based on open-sourced projects.☆280Updated this week
- Inference code for LLaMA models☆121Updated last year
- 更纯粹、更高压缩率的Tokenizer☆481Updated 6 months ago
- The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.☆1,113Updated this week
- ChatGLM-6B HTTP流式解码API的Flask、FastAPI实现,以及开箱即用的Web页面。 a stream decoding demo of ChatGLM-6B using Flask or FastAPI, with web page out-of-th…☆92Updated last year