zhouwg / ggml-hexagon
focus on implementation of ggml-hexagon backend for Qualcomm's Hexagon NPU, details can be seen at https://github.com/zhouwg/ggml-hexagon/discussions/18
☆13Updated this week
Alternatives and similar repositories for ggml-hexagon:
Users that are interested in ggml-hexagon are comparing it to the libraries listed below
- LLM inference in C/C++☆34Updated last week
- ☆32Updated 3 weeks ago
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆60Updated this week
- Large Language Model Onnx Inference Framework☆32Updated 2 months ago
- ☆124Updated last year
- snpe tutorial☆10Updated last year
- llm deploy project based onnx.☆35Updated 5 months ago
- ☆36Updated 5 months ago
- mperf是一个面向移动/嵌入式平台的算子性能调优工具箱☆179Updated last year
- ☆10Updated 8 months ago
- ☆32Updated 8 months ago
- Run Chinese MobileBert model on SNPE.☆14Updated last year
- ☆29Updated 11 months ago
- DDK for Rockchip NPU☆61Updated 4 years ago
- Flash Attention in raw Cuda C beating PyTorch☆20Updated 10 months ago
- Explore LLM model deployment based on AXera's AI chips☆87Updated 2 weeks ago
- ggml学习笔记,ggml是一个机器学习的推理框架☆14Updated last year
- ☆157Updated this week
- A converter for llama2.c legacy models to ncnn models.☆87Updated last year
- 解读cudnn文档,掌握其用法☆16Updated 11 months ago
- Common libraries for PPL projects☆29Updated 3 weeks ago
- Standalone Flash Attention v2 kernel without libtorch dependency☆108Updated 6 months ago
- ☆30Updated 6 months ago
- ☆16Updated last year
- EasyNN是一个面向教学而开发的神经网络推理框架,旨在让大家0基础也能自主完成推理框架编写!☆26Updated 7 months ago
- linux bsp app & sample for axpi (ax620a)☆34Updated last year
- Header-only safetensors loader and saver in C++☆56Updated 3 weeks ago
- ☆10Updated 3 weeks ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆90Updated last month
- 使用 CUDA C++ 实现的 llama 模型推理框架☆48Updated 4 months ago