MegEngine / InferLLM
a lightweight LLM model inference framework
☆712Updated 9 months ago
Alternatives and similar repositories for InferLLM:
Users that are interested in InferLLM are comparing it to the libraries listed below
- llm deploy project based mnn.☆1,517Updated 3 weeks ago
- C++ implementation of Qwen-LM☆569Updated last month
- llm-export can export llm model to onnx.☆255Updated last week
- 纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行☆3,367Updated this week
- RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.☆582Updated 3 months ago
- DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …☆208Updated this week
- Efficient AI Inference & Serving☆462Updated last year
- 支持中文场景的的小语言模型 llama2.c-zh☆145Updated 10 months ago
- ☆302Updated 3 weeks ago
- The official repo of Aquila2 series proposed by BAAI, including pretrained & chat large language models.☆440Updated 3 months ago
- LLaMa/RWKV onnx models, quantization and testcase☆356Updated last year
- Play LLaMA2 (official / 中文版 / INT4 / llama2.cpp) Together! ONLY 3 STEPS! ( non GPU / 5GB vRAM / 8~14GB vRAM)☆540Updated last year
- ☆151Updated last month
- ☆591Updated 5 months ago
- 中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)☆591Updated 8 months ago
- ☆298Updated 5 months ago
- Easy and Efficient Finetuning LLMs. (Supported LLama, LLama2, LLama3, Qwen, Baichuan, GLM , Falcon) 大模型高效量化训练+部署.☆585Updated this week
- CMMLU: Measuring massive multitask language understanding in Chinese☆716Updated last month
- optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052☆467Updated 10 months ago
- LLM Inference benchmark☆377Updated 5 months ago
- C++ implementation of ChatGLM-6B & ChatGLM2-6B & ChatGLM3 & GLM4(V)☆2,959Updated 5 months ago
- ☆444Updated last year
- XVERSE-13B: A multilingual large language model developed by XVERSE Technology Inc.☆648Updated 9 months ago
- export llama to onnx☆111Updated 3 weeks ago
- Efficient Training (including pre-training and fine-tuning) for Big Models☆574Updated 5 months ago
- Yuan 2.0 Large Language Model☆683Updated 6 months ago
- TigerBot: A multi-language multi-task LLM☆2,253Updated 3 weeks ago
- torch_musa is an open source repository based on PyTorch, which can make full use of the super computing power of MooreThreads graphics c…☆346Updated 2 months ago
- 中文Mixtral-8x7B(Chinese-Mixtral-8x7B)☆645Updated 5 months ago
- ☆127Updated 3 weeks ago