Tlntin / qwen-ascend-llmLinks

☆52

Alternatives and similar repositories for qwen-ascend-llm

Users that are interested in qwen-ascend-llm are comparing it to the libraries listed below

Sorting:

wangzhaode / llm-export
llm-export can export llm model to onnx.
☆328Updated 3 weeks ago
sophgo / ChatGLM2-TPU
run ChatGLM2-6B in BM1684X
☆50Updated last year
luchangli03 / onnxsim_large_model
simplify >2GB large onnx model
☆66Updated 11 months ago
modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆268Updated 3 months ago
Tlntin / ChatGLM2-6B-TensorRT
☆90Updated 2 years ago
DeepLink-org / dlinfer
☆65Updated last week
luchangli03 / export_llama_to_onnx
export llama to onnx
☆136Updated 10 months ago
SmartFlowAI / LLM101n-CN
LLM101n: Let's build a Storyteller 中文版
☆135Updated last year
inisis / OnnxLLM
Large Language Model Onnx Inference Framework
☆36Updated 3 weeks ago
zhaohb / fastapi_tritonserver
☆27Updated last year
TRT2022 / trtllm-llama
☢️ TensorRT 2023复赛——基于TensorRT-LLM的Llama模型推断加速优化
☆50Updated 2 years ago
BestAnHongjun / LMDeploy-Jetson
Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function ind…
☆102Updated last year
mindspore-lab / mindformers
☆177Updated this week
DataXujing / Qwen1.5-0.5b-chat-android
基于MNN-llm的安卓手机部署大语言模型：Qwen1.5-0.5B-Chat
☆86Updated last year
zai-org / GLM-Edge
GLM Series Edge Models
☆154Updated 5 months ago
FeiGeChuanShu / trt2023
NVIDIA TensorRT Hackathon 2023复赛选题：通义千问Qwen-7B用TensorRT-LLM模型搭建及优化
☆43Updated 2 years ago
LDLINGLINGLING / adan_application
一些大语言模型和多模态模型的生态,主要包括跨模态搜索、投机解码、QAT量化、多模态量化、ChatBot、OCR
☆193Updated 3 months ago
tpoisonooo / llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
☆367Updated 2 years ago
bug-developer021 / YOLOV5_optimization_on_triton
Compare multiple optimization methods on triton to imporve model service performance
☆52Updated last year
pandada8 / llm-inference-benchmark
LLM 推理服务性能测试
☆44Updated last year
BaofengZan / GOT-OCRv2-onnx
用于学习GOT/Qwen/OnnxLLm
☆53Updated last year
hyperai / vllm-cn
vLLM Documentation in Chinese Simplified / vLLM 中文文档
☆124Updated last month
MooreThreads / vllm_musa
A high-throughput and memory-efficient inference and serving engine for LLMs
☆68Updated last year
AI-Study-Han / Zero-Qwen-VL
训练一个对中文支持更好的LLaVA模型，并开源训练代码和数据。
☆76Updated last year
DataXujing / TensorRT-LLM-ChatGLM3
大模型部署实战：TensorRT-LLM, Triton Inference Server, vLLM
☆26Updated last year
Tencent / AngelSlim
Model compression toolkit engineered for enhanced usability, comprehensiveness, and efficiency.
☆201Updated last week
Rayrtfr / FasterTransformer
Transformer related optimization, including BERT, GPT
☆17Updated 2 years ago
liunian-Jay / MU-GOT
PDF解析工具：GOT的vLLM加速实现，MinerU做布局识别裁剪、GOT做表格公式解析，实现RAG中的pdf解析
☆66Updated last year
IEIT-Yuan / Yuan2.0-M32
Mixture-of-Experts (MoE) Language Model
☆192Updated last year
yinghuo302 / ascend-llm
基于昇腾310芯片的大语言模型部署
☆25Updated last year