zhouwg / ggml-hexagonLinks
an open-source reference implementation of ggml-hexagon backend for llama.cpp on Android phone equipped with Qualcomm's Hexagon NPU, details can be seen at https://github.com/zhouwg/ggml-hexagon/discussions/18
☆22Updated this week
Alternatives and similar repositories for ggml-hexagon
Users that are interested in ggml-hexagon are comparing it to the libraries listed below
Sorting:
- LLM inference in C/C++☆41Updated last week
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆68Updated last week
- llm deploy project based onnx.☆37Updated 7 months ago
- llama 2 Inference☆41Updated last year
- ☆123Updated last year
- Sophgo AI chips driver and runtime library.☆21Updated last week
- ☆10Updated 10 months ago
- mperf是一个面向移动/嵌入式平台的算子性能调优工具箱☆187Updated last year
- A converter for llama2.c legacy models to ncnn models.☆87Updated last year
- Large Language Model Onnx Inference Framework☆35Updated 4 months ago
- EasyNN是一个面向教学而开发的神经网络推理框架,旨在让大家0基础也能自主完成推理框架编写!☆28Updated 9 months ago
- TensorRT encapsulation, learn, rewrite, practice.☆28Updated 2 years ago
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆43Updated this week
- Snapdragon Neural Processing Engine (SNPE) SDKThe Snapdragon Neural Processing Engine (SNPE) is a Qualcomm Snapdragon software accelerate…☆35Updated 3 years ago
- stable diffusion using mnn☆68Updated last year
- ☆32Updated 10 months ago
- ☆34Updated 2 months ago
- QAI AppBuilder is designed for developer to using Qualcomm® AI Runtime SDK to execute model on Windows on Snapdragon(WoS) and Linux platf…☆41Updated this week
- 使用 CUDA C++ 实现的 llama 模型推理框架☆57Updated 6 months ago
- ☆21Updated 4 years ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆109Updated 8 months ago
- ggml学习笔记,ggml是一个机器学习的推理框架☆15Updated last year
- ☆33Updated last year
- Code for ACM MobiCom 2024 paper "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices"☆53Updated 4 months ago
- 📚FFPA(Split-D): Extend FlashAttention with Split-D for large headdim, O(1) GPU SRAM complexity, 1.8x~3x↑🎉 faster than SDPA EA.☆184Updated 3 weeks ago
- ☆148Updated 4 months ago
- Inference deployment of the llama3☆11Updated last year
- ☆36Updated 7 months ago
- Header-only safetensors loader and saver in C++☆62Updated 3 weeks ago
- A simple neural network inference framework☆25Updated last year