Tlntin / Qwen-TensorRT-LLM
☆602Updated 7 months ago
Alternatives and similar repositories for Qwen-TensorRT-LLM:
Users that are interested in Qwen-TensorRT-LLM are comparing it to the libraries listed below
- Accelerate inference without tears☆304Updated this week
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆237Updated this week
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆656Updated last month
- export llama to onnx☆115Updated 2 months ago
- ☆156Updated this week
- llm-export can export llm model to onnx.☆270Updated last month
- 通义千问VLLM推理部署DEMO☆543Updated 11 months ago
- ☆90Updated last year
- Best practice for training LLaMA models in Megatron-LM☆644Updated last year
- ☆27Updated 4 months ago
- ☆319Updated last month
- The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.☆925Updated this week
- A streamlined and customizable framework for efficient large model evaluation and performance benchmarking☆574Updated this week
- C++ implementation of Qwen-LM☆581Updated 3 months ago
- Firefly中文LLaMA-2大模型,支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型☆407Updated last year
- LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)☆397Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆132Updated 3 months ago
- Baichuan2代码的逐行解析版本,适合小白☆212Updated last year
- 欢迎来到 "LLM-travel" 仓库!探索大语言模型(LLM)的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。☆301Updated 7 months ago
- LLM Inference benchmark☆401Updated 7 months ago
- 更纯粹、更高压缩率的Tokenizer☆470Updated 3 months ago
- InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencie…☆363Updated last week
- ☆310Updated 9 months ago
- Optimize QWen1.5 models with TensorRT-LLM☆17Updated 9 months ago
- 使用peft库,对chatGLM-6B/chatGLM2-6B实现4bit的QLoRA高效微调,并做lora model和base model的merge及4bit的量化(quantize)。☆358Updated last year
- text embedding☆145Updated last year
- chatglm多gpu用deepspeed和☆405Updated 8 months ago
- ☆305Updated last year
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆469Updated 11 months ago