CodeLinaro / llama.cppLinks

LLM inference in C/C++

☆20

Alternatives and similar repositories for llama.cpp

Users that are interested in llama.cpp are comparing it to the libraries listed below

Sorting:

chraac / llama.cpp
LLM inference in C/C++
☆48Updated this week
MegEngine / mperf
mperf是一个面向移动/嵌入式平台的算子性能调优工具箱
☆193Updated 2 years ago
nihui / ruapu
Detect CPU features with single-file
☆442Updated last month
jeffzhou2000 / ggml-hexagon
the original reference implementation of a specified llama.cpp backend for Qualcomm Hexagon NPU on Android phone, https://github.com/ggml…
☆35Updated 6 months ago
nihui / vkpeak
A tool which profiles Vulkan devices to find their peak capacities
☆159Updated 3 weeks ago
willhua / QualcommOpenCLSDKNote
The note of Qualcomm OpenCL SDK
☆37Updated 7 years ago
quic / ai-engine-direct-helper
QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …
☆111Updated this week
futz12 / ncnn_llm
A repo for llm on ncnn
☆189Updated last month
pigirons / conv3x3_m1
This is a demo how to write a high performance convolution run on apple silicon
☆57Updated 3 years ago
Repeerc / flash-attention-v2-RDNA3-minimal
a simple Flash Attention v2 implementation with ROCM (RDNA3 GPU, roc wmma), mainly used for stable diffusion(ComfyUI) in Windows ZLUDA en…
☆51Updated last year
haozixu / htp-ops-lib
Self-implemented NN operators for Qualcomm's Hexagon NPU
☆46Updated 4 months ago
MollySophia / rwkv-qualcomm
Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK
☆90Updated this week
intel / intel-xpu-backend-for-triton
OpenAI Triton backend for Intel® GPUs
☆226Updated this week
mlc-ai / relax
☆172Updated this week
QianyanTech / NBAssembler
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
☆95Updated 2 years ago
nihui / valgrind-android
☆63Updated 4 years ago
XiaoMi / StableDiffusionOnDevice
本项目是一个通过文字生成图片的项目，基于开源模型Stable Diffusion V1.5生成可以在手机的CPU和NPU上运行的模型，包括其配套的模型运行框架。
☆231Updated last year
daquexian / faster-rwkv
☆125Updated 2 years ago
lx200916 / ChatBotApp
☆42Updated 10 months ago
tfruan2000 / mlsys-study-note
My study note for mlsys
☆15Updated last year
haozixu / llama.cpp-npu
☆56Updated last month
Cambricon / mlu-ops
Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .
☆150Updated 2 weeks ago
MARD1NO / CUDA-PPT
☆118Updated 10 months ago
TiledTensor / TiledCUDA
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆192Updated last year
EdVince / diffusers-ncnn
☆85Updated 2 years ago
flagos-ai / FlagTree
FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang…
☆200Updated this week
waau / qualcomm-nnlib
Qualcomm Hexagon NN Offload Framework
☆45Updated 5 years ago
microsoft / ArchProbe
A profiler to disclose and quantify hardware features on GPUs.
☆175Updated 3 years ago
OpenPPL / CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆84Updated 2 years ago
lrw04 / llama2.c-to-ncnn
A converter for llama2.c legacy models to ncnn models.
☆79Updated 2 years ago