QwenLM/vllm-gptq

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/QwenLM/vllm-gptq)

QwenLM / vllm-gptq

A high-throughput and memory-efficient inference and serving engine for LLMs

☆141

Alternatives and similar repositories for vllm-gptq

Users that are interested in vllm-gptq are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

casper-hansen / AutoAWQ
View on GitHub
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
☆2,351May 11, 2025Updated last year
AutoGPTQ / AutoGPTQ
View on GitHub
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
☆5,076Apr 11, 2025Updated last year
kleinlee / MiniQwen
View on GitHub
☆14Dec 6, 2023Updated 2 years ago
QwenLM / qwen.cpp
View on GitHub
C++ implementation of Qwen-LM
☆627Dec 6, 2024Updated last year
QwenLM / ConsisEval
View on GitHub
☆14Jul 5, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
t6am3 / law_glm_baseline
View on GitHub
☆15Jun 20, 2024Updated 2 years ago
ssbuild / moss_finetuning
View on GitHub
moss chat finetuning
☆51Apr 23, 2024Updated 2 years ago
QwenLM / Qwen
View on GitHub
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
☆21,500Mar 5, 2026Updated 4 months ago
CLUEbenchmark / SuperCLUE-RAG
View on GitHub
中文原生检索增强生成测评基准
☆132Apr 18, 2024Updated 2 years ago
THUDM / LongAlign
View on GitHub
[EMNLP 2024] LongAlign: A Recipe for Long Context Alignment of LLMs
☆262Dec 16, 2024Updated last year
FlagOpen / FlagEmbedding
View on GitHub
Retrieval and Retrieval-augmented LLMs
☆11,997Apr 22, 2026Updated 3 months ago
dlzhengming / order_collection_flink
View on GitHub
☆10Aug 2, 2021Updated 4 years ago
zhangnick01 / ELandingTime
View on GitHub
Estimate Landing Time
☆13Aug 2, 2021Updated 4 years ago
gameofdimension / vllm-cn
View on GitHub
演示 vllm 对中文大语言模型的神奇效果
☆31Nov 4, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
xusenlinzy / api-for-open-llm
View on GitHub
Openai style api for open large language models, using LLMs just as chatgpt! Support for LLaMA, LLaMA-2, BLOOM, Falcon, Baichuan, Qwen, X…
☆2,460Sep 26, 2024Updated last year
ArtificialZeng / llama3_explained
View on GitHub
the newest version of llama3，source code explained line by line using Chinese
☆22Apr 19, 2024Updated 2 years ago
yongzhuo / MacroGPT-Pretrain
View on GitHub
macrogpt大模型全量预训练(1b3,32层), 多卡deepspeed/单卡adafactor
☆15Nov 30, 2023Updated 2 years ago
LianjiaTech / BELLE
View on GitHub
BELLE: Be Everyone's Large Language model Engine（开源中文对话大模型）
☆8,279Oct 16, 2024Updated last year
owenliang / qwen-vllm
View on GitHub
通义千问VLLM推理部署DEMO
☆643Mar 28, 2024Updated 2 years ago
hkust-nlp / ceval
View on GitHub
Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]
☆1,862Jul 27, 2025Updated last year
susirial / Mojuan
View on GitHub
Mojuan: Write your own AI application.
☆16Jul 12, 2024Updated 2 years ago
matthewchung74 / qwen_2_5_3B_GRPO_medical_thinking
View on GitHub
☆50Apr 21, 2025Updated last year
FlagAI-Open / Aquila2
View on GitHub
The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.
☆446Oct 11, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
InternLM / lmdeploy
View on GitHub
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
☆7,981Updated this week
QwenLM / Qwen-VL
View on GitHub
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
☆6,715Aug 7, 2024Updated last year
QwenLM / online_merging_optimizers
View on GitHub
Implementations of online merging optimizers proposed by Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment
☆82Jun 19, 2024Updated 2 years ago
ztxz16 / fastllm
View on GitHub
fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型，任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型，单并发20tps；INT4量化模型单并发30tp…
☆4,873Updated this week
NJUDeepEngine / CAEF
View on GitHub
Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"
☆11Oct 11, 2024Updated last year
QwenLM / Qwen-Agent
View on GitHub
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
☆16,869Mar 4, 2026Updated 4 months ago
chu-tianxiang / vllm-gptq
View on GitHub
A high-throughput and memory-efficient inference and serving engine for LLMs
☆131Jun 25, 2024Updated 2 years ago
liteli1987gmail / autogen
View on GitHub
autogen 中文文档
☆10Nov 7, 2023Updated 2 years ago
WangRongsheng / Aurora
View on GitHub
The official codes for "Aurora: Activating chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning"
☆261May 9, 2024Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
huggingface / text-generation-inference
View on GitHub
Large Language Model Text Generation Inference
☆10,884Mar 21, 2026Updated 4 months ago
mzbac / qlora-inference-multi-gpu
View on GitHub
☆14May 25, 2023Updated 3 years ago
baichuan-inc / Baichuan-13B
View on GitHub
A 13B large language model developed by Baichuan Intelligent Technology
☆2,930Sep 6, 2023Updated 2 years ago
AdugiBeyond / blockchain_source
View on GitHub
该项目主要是搜集网络上的优质区块链资源（主要是以太坊和fabric），包括手册，工具，教程，源码分析等，会持续更新
☆10Feb 26, 2019Updated 7 years ago
01-ai / Yi
View on GitHub
A series of large language models trained from scratch by developers @01-ai
☆7,822Nov 27, 2024Updated last year
TigerResearch / TigerBot
View on GitHub
TigerBot: A multi-language multi-task LLM
☆2,259Dec 28, 2024Updated last year
yangjianxin1 / Firefly
View on GitHub
Firefly: 大模型训练工具，支持训练Qwen2.5、Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、…
☆6,649Oct 24, 2024Updated last year