haozixu / llama.cpp-npuLinks
☆43Updated 3 weeks ago
Alternatives and similar repositories for llama.cpp-npu
Users that are interested in llama.cpp-npu are comparing it to the libraries listed below
Sorting:
- Self-implemented NN operators for Qualcomm's Hexagon NPU☆37Updated 3 months ago
- Flexible DNN inference under changing memory budgets☆58Updated 11 months ago
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆90Updated last month
- Run Chinese MobileBert model on SNPE.☆15Updated 2 years ago
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆52Updated last year
- mperf是一个面向移动/嵌入式平台的算子性能调优工具箱☆192Updated 2 years ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆73Updated 6 years ago
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆111Updated 3 weeks ago
- the original reference implementation of a specified llama.cpp backend for Qualcomm Hexagon NPU on Android phone, https://github.com/ggml…☆35Updated 6 months ago
- ☆22Updated 4 years ago
- symmetric int8 gemm☆67Updated 5 years ago
- ☆125Updated 2 years ago
- ☆171Updated 2 weeks ago
- This is a demo how to write a high performance convolution run on apple silicon☆57Updated 3 years ago
- ☆176Updated 2 years ago
- ☆60Updated last year
- A quantization algorithm for LLM☆147Updated last year
- ☆33Updated last year
- ☆98Updated 4 years ago
- ☆18Updated last month
- ☆169Updated 2 years ago
- Benchmark code for the "Online normalizer calculation for softmax" paper☆104Updated 7 years ago
- hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.☆50Updated 2 years ago
- ☆34Updated last year
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆123Updated last year
- ☆126Updated 4 months ago
- FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang…☆155Updated last week
- ☆208Updated 4 years ago
- ☆85Updated 11 months ago
- ☆107Updated 2 weeks ago