MegEngine / InferLLMLinks
a lightweight LLM model inference framework
☆728Updated last year
Alternatives and similar repositories for InferLLM
Users that are interested in InferLLM are comparing it to the libraries listed below
Sorting:
- llm deploy project based mnn. This project has merged into MNN.☆1,587Updated 4 months ago
- C++ implementation of Qwen-LM☆588Updated 6 months ago
- fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tp…☆3,628Updated this week
- 支持中文场景的的小语言模型 llama2.c-zh☆147Updated last year
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆782Updated this week
- llm-export can export llm model to onnx.☆293Updated 4 months ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆254Updated last week
- C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)☆2,975Updated 10 months ago
- 中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)☆606Updated last year
- LLaMa/RWKV onnx models, quantization and testcase☆363Updated last year
- The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.☆441Updated 7 months ago
- Accelerate inference without tears☆315Updated 2 months ago
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,525Updated 2 months ago
- XVERSE-13B: A multilingual large language model developed by XVERSE Technology Inc.☆645Updated last year
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆473Updated last year
- ☆332Updated 4 months ago
- 计图大模型推理库,具有高性能、配置要求低、中文支持好、可移 植等特点☆2,426Updated 3 months ago
- Efficient AI Inference & Serving☆469Updated last year
- 中文Mixtral-8x7B(Chinese-Mixtral-8x7B)☆650Updated 9 months ago
- Yuan 2.0 Large Language Model☆685Updated 10 months ago
- export llama to onnx☆124Updated 5 months ago
- ☆127Updated 5 months ago
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆481Updated this week
- ☆427Updated this week
- Repo for adapting Meta LlaMA2 in Chinese! META最新发布的LlaMA2的汉化版! (完全开源可商用)☆742Updated last year
- LLM Inference benchmark☆419Updated 10 months ago
- Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型,支持接入langchain加载本地知识库做检索增强生成RAG。Training your own Phi2 small chat model from scratch.☆552Updated 10 months ago
- ☆166Updated this week
- ☆609Updated 10 months ago
- Low-bit LLM inference on CPU with lookup table☆793Updated last week