MegEngine / InferLLM
a lightweight LLM model inference framework
☆727Updated last year
Alternatives and similar repositories for InferLLM
Users that are interested in InferLLM are comparing it to the libraries listed below
Sorting:
- llm deploy project based mnn. This project has merged into MNN.☆1,578Updated 3 months ago
- C++ implementation of Qwen-LM☆587Updated 5 months ago
- fastllm是c++实现,后端无依赖(仅依赖CUDA,无需依赖PyTorch)的高性能大模型推理库。 可实现单4090推理DeepSeek R1 671B INT4模型,单路可达20+tps。☆3,561Updated this week
- llm-export can export llm model to onnx.☆289Updated 4 months ago
- 支持中文场景的的小语言模型 llama2.c-zh☆147Updated last year
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆251Updated this week
- ☆162Updated last month
- LLaMa/RWKV onnx models, quantization and testcase☆363Updated last year
- 中文Mixtral混 合专家大模型(Chinese Mixtral MoE LLMs)☆604Updated last year
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆749Updated this week
- XVERSE-13B: A multilingual large language model developed by XVERSE Technology Inc.☆645Updated last year
- Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.☆602Updated 3 months ago
- C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)☆2,971Updated 9 months ago
- Efficient AI Inference & Serving☆469Updated last year
- LLM Inference benchmark☆417Updated 9 months ago
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,516Updated last month
- export llama to onnx☆124Updated 4 months ago
- ☆330Updated 3 months ago
- The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.☆441Updated 7 months ago
- ☆608Updated 9 months ago
- Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型,支持接入langchain加载本地知识库做检索增强生成RAG。Training your own Phi2 small chat model from scratch.☆551Updated 10 months ago
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆473Updated last year
- 骆驼:A Chinese finetuned instruction LLaMA. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技☆720Updated last year
- TigerBot: A multi-language multi-task LLM☆2,258Updated 4 months ago
- 计图大模型推理库,具有高性能、配置要求低、中文支持好、可移植等特点☆2,421Updated 2 months ago
- Yuan 2.0 Large Language Model☆683Updated 10 months ago
- 使用peft库,对chatGLM-6B/chatGLM2-6B实现4bit的QLoRA高效微调,并做lora model和base model的merge及4bit的量化(quantize)。☆361Updated last year
- chatglm 6b finetuning and alpaca finetuning☆1,543Updated 2 months ago
- Code for fintune ChatGLM-6b using low-rank adaptation (LoRA)☆720Updated last year
- ☆425Updated this week