hipudding / llama.cpp

LLM inference in C/C++

☆11

Alternatives and similar repositories for llama.cpp:

Users that are interested in llama.cpp are comparing it to the libraries listed below

modelscope / dash-infer
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including …
☆240Updated 3 weeks ago
sunkx109 / llama
Inference code for LLaMA models
☆118Updated last year
vllm-project / vllm-ascend
Community maintained hardware plugin for vLLM on Ascend
☆393Updated this week
Tlntin / Qwen-TensorRT-LLM
☆604Updated 8 months ago
Tlntin / ChatGLM2-6B-TensorRT
☆90Updated last year
PaddlePaddle / PaddleCustomDevice
PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
☆82Updated this week
FlagOpen / FlagScale
FlagScale is a large model toolkit based on open-sourced projects.
☆257Updated this week
alibaba / rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
☆678Updated 2 months ago
pcg-mlp / KsanaLLM
☆324Updated 2 months ago
mindspore-lab / mindformers
☆158Updated this week
luchangli03 / export_llama_to_onnx
export llama to onnx
☆118Updated 3 months ago
issaccv / aiops24-RAG-demo
用于AIOPS24挑战赛的Demo
☆61Updated 9 months ago
SmartFlowAI / LLM101n-CN
LLM101n: Let's build a Storyteller 中文版
☆130Updated 7 months ago
OpenPPL / ppl.nn.llm
☆139Updated 11 months ago
QwenLM / vllm-gptq
A high-throughput and memory-efficient inference and serving engine for LLMs
☆132Updated 3 months ago
intel / xFasterTransformer
☆409Updated this week
OpenPPL / ppl.llm.serving
☆127Updated 3 months ago
mindspore-lab / mindpet
☆45Updated last year
FlagOpen / FlagPerf
FlagPerf is an open-source software platform for benchmarking AI chips.
☆325Updated last month
Franc-Z / QWen1.5_TensorRT-LLM
Optimize QWen1.5 models with TensorRT-LLM
☆17Updated 10 months ago
wangzhaode / llm-export
llm-export can export llm model to onnx.
☆274Updated 2 months ago
zejunwang1 / easytokenizer
高性能文本 Tokenizer 库
☆28Updated last year
zms1999 / SmartMoE
A MoE impl for PyTorch, [ATC'23] SmartMoE
☆61Updated last year
OpenPPL / ppl.llm.kernel.cuda
☆145Updated 2 months ago
madsys-dev / deepseekv2-profile
☆125Updated 3 weeks ago
QwenLM / qwen.cpp
C++ implementation of Qwen-LM
☆582Updated 3 months ago
void-main / FasterTransformer
Transformer related optimization, including BERT, GPT
☆59Updated last year
antgroup / glake
GLake: optimizing GPU memory management and IO transmission.
☆449Updated last week
Glanvery / LLM-Travel
欢迎来到 "LLM-travel" 仓库！探索大语言模型（LLM）的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。
☆305Updated 8 months ago
DeepLink-org / dlinfer
☆46Updated this week