MegEngine / InferLLM
a lightweight LLM model inference framework
☆728Updated last year
Alternatives and similar repositories for InferLLM:
Users that are interested in InferLLM are comparing it to the libraries listed below
- fastllm是c++实现,后端无依赖(仅依赖CUDA,无需依赖PyTorch)的高性能大模型推理库。 可实现单4090推理DeepSeek R1 671B INT4模型,单路可达20+tps。☆3,514Updated this week
- C++ implementation of Qwen-LM☆585Updated 4 months ago
- llm deploy project based mnn. This project has merged into MNN.☆1,574Updated 3 months ago
- 支持中文场景的的小语言模型 llama2.c-zh☆144Updated last year
- llm-export can export llm model to onnx.☆282Updated 3 months ago
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆706Updated 3 months ago
- LLaMa/RWKV onnx models, quantization and testcase☆361Updated last year
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆243Updated last week
- Yuan 2.0 Large Language Model☆683Updated 9 months ago
- Efficient AI Inference & Serving☆471Updated last year
- C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)☆2,971Updated 8 months ago
- Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.☆600Updated 3 months ago
- ☆326Updated 3 months ago
- Phi2-Chinese-0.2B 从0开始训练自己的Phi2中文小模型,支持接入langchain加载本地知识库做检索增强生成RAG。Training your own Phi2 small chat model from scratch.☆548Updated 9 months ago
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆472Updated last year
- export llama to onnx☆121Updated 4 months ago
- 骆驼:A Chinese finetuned instruction LLaMA. Developed by 陈启源 @ 华中师范大学 & 李鲁鲁 @ 商汤科技 & 冷子昂 @ 商汤科技☆718Updated last year
- 中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)☆603Updated 11 months ago
- LLM Inference benchmark☆413Updated 9 months ago
- Play LLaMA2 (official / 中文版 / INT4 / llama2.cpp) Together! ONLY 3 STEPS! ( non GPU / 5GB vRAM / 8~14GB vRAM)☆540Updated last year
- ☆127Updated 4 months ago
- 计图大模型推理库,具有高性能、配置要求低、中文支持好、可移植等特点☆2,420Updated 2 months ago
- 使用peft库,对chatGLM-6B/chatGLM2-6B实现4bit的QLoRA高效微调,并做lora model和base model的merge及4bit的量化(quantize)。☆361Updated last year
- Large-scale model inference.☆629Updated last year
- Code for fintune ChatGLM-6b using low-rank adaptation (LoRA)☆721Updated last year
- TigerBot: A multi-language multi-task LLM☆2,259Updated 4 months ago
- XVERSE-13B: A multilingual large language model developed by XVERSE Technology Inc.☆645Updated last year
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,509Updated last month
- [EMNLP 2024 Industry Track] This is the official PyTorch implementation of "LLMC: Benchmarking Large Language Model Quantization with a V…☆461Updated this week
- ☆161Updated 2 weeks ago