jeffzhou2000 / ggml-hexagonLinks

the original reference implementation of a specified llama.cpp backend for Qualcomm Hexagon NPU on Android phone, https://github.com/ggml-org/llama.cpp/pull/12326. not maintained since Jul 15 2025

☆35

Alternatives and similar repositories for ggml-hexagon

Users that are interested in ggml-hexagon are comparing it to the libraries listed below

Sorting:

chraac / llama.cpp
LLM inference in C/C++
☆48Updated this week
MollySophia / rwkv-qualcomm
Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK
☆89Updated 3 weeks ago
quic / ai-engine-direct-helper
QAI AppBuilder is designed to help developers easily execute models on WoS and Linux platforms. It encapsulates the Qualcomm® AI Runtime …
☆98Updated this week
MegEngine / mperf
mperf是一个面向移动/嵌入式平台的算子性能调优工具箱
☆192Updated 2 years ago
wangzhaode / onnx-llm
llm deploy project based onnx.
☆48Updated last year
inisis / OnnxLLM
Large Language Model Onnx Inference Framework
☆36Updated last month
HuPengsheet / EasyNN
EasyNN是一个面向教学而开发的神经网络推理框架，旨在让大家0基础也能自主完成推理框架编写！
☆34Updated last year
lx200916 / ChatBotApp
☆41Updated 8 months ago
ARM-software / kleidiai
This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai
☆109Updated this week
sophgo / LLM-TPU
Run generative AI models in sophgo BM1684X/BM1688
☆257Updated this week
caiwanxianhust / FasterLLaMA
使用 CUDA C++ 实现的 llama 模型推理框架
☆63Updated last year
daquexian / faster-rwkv
☆125Updated 2 years ago
wangzhaode / llm-export
llm-export can export llm model to onnx.
☆337Updated 2 months ago
weishengying / tiny-flash-attention
使用 cutlass 实现 flash-attention 精简版，具有教学意义
☆52Updated last year
flagos-ai / flagtree
FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang…
☆149Updated this week
weishengying / cute_gemm
☆20Updated last year
xxxxyu / FlexNN
Code for ACM MobiCom 2024 paper "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices"
☆58Updated 11 months ago
sunkx109 / llama.cpp
llama 2 Inference
☆43Updated 2 years ago
haozixu / htp-ops-lib
Self-implemented NN operators for Qualcomm's Hexagon NPU
☆34Updated 2 months ago
BBuf / tensorrt-llm-moe
☆33Updated 10 months ago
wangzhaode / mnn-stable-diffusion
stable diffusion using mnn
☆67Updated 2 years ago
lrw04 / llama2.c-to-ncnn
A converter for llama2.c legacy models to ncnn models.
☆79Updated 2 years ago
BBuf / how-to-optimize-gemm
☆98Updated 4 years ago
wangzyon / trt_learn
TensorRT encapsulation, learn, rewrite, practice.
☆29Updated 3 years ago
OpenPPL / ppl.kernel.cuda
☆38Updated last year
mlc-ai / relax
☆171Updated 2 weeks ago
sophgo / libsophon
Sophgo AI chips driver and runtime library.
☆24Updated last week
StudyingLover / ggml-tutorial
☆34Updated last year
iclementine / optimize_softmax
Optimize softmax in triton in many cases
☆21Updated last year
JieRen98 / SGEMM-SASS-Annotation
☆21Updated 4 years ago