chraac / llama.cpp

LLM inference in C/C++

☆21

Alternatives and similar repositories for llama.cpp:

Users that are interested in llama.cpp are comparing it to the libraries listed below

MollySophia / rwkv-qualcomm
Inference RWKV v5, v6 and (WIP) v7 with Qualcomm AI Engine Direct SDK
☆49Updated last week
XiaoMi / StableDiffusionOnDevice
本项目是一个通过文字生成图片的项目，基于开源模型Stable Diffusion V1.5生成可以在手机的CPU和NPU上运行的模型，包括其配套的模型运行框架。
☆135Updated 10 months ago
quic / ai-hub-apps
The Qualcomm® AI Hub apps are a collection of state-of-the-art machine learning models optimized for performance (latency, memory etc.) a…
☆113Updated last week
lrw04 / llama2.c-to-ncnn
A converter for llama2.c legacy models to ncnn models.
☆86Updated last year
wangzhaode / llm-export
llm-export can export llm model to onnx.
☆257Updated last week
TroyTzou / mlc-llm-android
参考自mlc-llm，个人尝试在android手机上部署大模型并运行
☆71Updated 5 months ago
nihui / ruapu
Detect CPU features with single-file
☆350Updated 3 weeks ago
zhouwg / kantv
workbench for learing&practising AI tech in real scenario on Android device, powered by GGML(Georgi Gerganov Machine Learning) and NCNN(T…
☆136Updated this week
DakeQQ / Native-LLM-for-Android
Demonstration of running a native LLM on Android device.
☆106Updated this week
DataXujing / Qwen1.5-0.5b-chat-android
基于MNN-llm的安卓手机部署大语言模型：Qwen1.5-0.5B-Chat
☆63Updated 9 months ago
tpoisonooo / llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
☆356Updated last year
wangzhaode / mnn-stable-diffusion
stable diffusion using mnn
☆65Updated last year
MollySophia / rwkv-ncnn
Infere RWKV on NCNN
☆48Updated 4 months ago
EdVince / diffusers-ncnn
☆84Updated last year
MegEngine / InferLLM
a lightweight LLM model inference framework
☆713Updated 9 months ago
daquexian / faster-rwkv
☆124Updated last year
EdVince / llm-cpp
☆32Updated 6 months ago
nihui / vkpeak
A tool which profiles Vulkan devices to find their peak capacities
☆106Updated 4 months ago
XiaoMi / nnlib
Fork of https://source.codeaurora.org/quic/hexagon_nn/nnlib
☆57Updated last year
UbiquitousLearning / mllm
Fast Multimodal LLM on Mobile Devices
☆671Updated this week
fuyufjh / GraphicBuffer
Use GraphicBuffer class from Android native code
☆200Updated 3 years ago
yuunnn-w / RWKV_Pytorch
This is an inference framework for the RWKV large language model implemented purely in native PyTorch. The official native implementation…
☆125Updated 6 months ago
MegEngine / mperf
mperf是一个面向移动/嵌入式平台的算子性能调优工具箱
☆175Updated last year
mobile-algorithm-optimization / guide
there are guide examples for mobile cv algorithms optimization.
☆28Updated 2 years ago
microsoft / T-MAC
Low-bit LLM inference on CPU with lookup table
☆655Updated 3 weeks ago
OpenGVLab / OmniQuant
[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.
☆757Updated 3 months ago
pnnx / pnnx
PyTorch Neural Network eXchange
☆554Updated this week
gesanqiu / Chinese_MobileBert_on_SNPE
Run Chinese MobileBert model on SNPE.
☆14Updated last year
usefulsensors / useful-transformers
Efficient Inference of Transformer models
☆414Updated 5 months ago
microsoft / BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
☆503Updated this week