Tlntin / Qwen-TensorRT-LLM
☆598Updated 6 months ago
Alternatives and similar repositories for Qwen-TensorRT-LLM:
Users that are interested in Qwen-TensorRT-LLM are comparing it to the libraries listed below
- ☆306Updated 7 months ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆231Updated last week
- export llama to onnx☆112Updated last month
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆629Updated last month
- llm-export can export llm model to onnx.☆263Updated last month
- ☆314Updated last month
- 通义千问VLLM推理部署DEMO☆521Updated 10 months ago
- ☆90Updated last year
- ☆153Updated this week
- FlagScale is a large model toolkit based on open-sourced projects.☆223Updated this week
- ☆27Updated 3 months ago
- Best practice for training LLaMA models in Megatron-LM☆644Updated last year
- A streamlined and customizable framework for efficient large model evaluation and performance benchmarking☆442Updated this week
- LLM Tuning with PEFT (SFT+RM+PPO+DPO with LoRA)☆390Updated last year
- The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.☆856Updated last week
- Baichuan2代码的逐行解析版本,适合小白☆212Updated last year
- ☆127Updated last month
- 欢迎来到 "LLM-travel" 仓库!探索大语言模型(LLM)的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。☆298Updated 7 months ago
- 更纯粹、更高压缩率的Tokenizer☆471Updated 2 months ago
- CIKM2023 Best Demo Paper Award. HugNLP is a unified and comprehensive NLP library based on HuggingFace Transformer. Please hugging for NL…☆386Updated last year
- Inference code for LLaMA models☆113Updated last year
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆469Updated 11 months ago
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆407Updated this week
- Firefly中文LLaMA-2大模型,支持增量预训练Baichuan2、Llama2、Llama、Falcon、Qwen、Baichuan、InternLM、Bloom等大模型☆406Updated last year
- A high-throughput and memory-efficient inference and serving engine for LLMs☆129Updated 2 months ago
- LLM Inference benchmark☆394Updated 6 months ago
- TrustRAG:The RAG Framework within Reliable input,Trusted output☆651Updated this week
- chatglm多gpu用deepspeed和☆405Updated 7 months ago
- llama inference for tencentpretrain☆97Updated last year