haozixu / llama.cpp-npuLinks
☆56Updated last month
Alternatives and similar repositories for llama.cpp-npu
Users that are interested in llama.cpp-npu are comparing it to the libraries listed below
Sorting:
- [MobiCom 24] Efficient and Adaptive DNN inference under changeable memory budgets☆58Updated last year
- Self-implemented NN operators for Qualcomm's Hexagon NPU☆46Updated 4 months ago
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆90Updated last week
- This is the open-source version of TinyTS. The code is dirty so far. We may clean the code in the future.☆19Updated 6 months ago
- 🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)☆10Updated this week
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆113Updated this week
- ☆18Updated 3 weeks ago
- ☆169Updated 2 years ago
- mperf是一个面向移动/嵌入式平台的算子性能调优工具箱☆192Updated 2 years ago
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆56Updated last year
- ☆177Updated 2 years ago
- play gemm with tvm☆92Updated 2 years ago
- symmetric int8 gemm☆66Updated 5 years ago
- ☆208Updated 4 years ago
- [ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.☆134Updated last year
- hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.☆49Updated 2 years ago
- ☆21Updated 4 years ago
- A quantization algorithm for LLM☆148Updated last year
- Benchmark code for the "Online normalizer calculation for softmax" paper☆105Updated 7 years ago
- ☆19Updated last month
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆73Updated 6 years ago
- the original reference implementation of a specified llama.cpp backend for Qualcomm Hexagon NPU on Android phone, https://github.com/ggml…☆35Updated 6 months ago
- ☆85Updated last year
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆127Updated last year
- Penn CIS 5650 (GPU Programming and Architecture) Final Project☆44Updated 2 years ago
- ☆22Updated 4 years ago
- This is a demo how to write a high performance convolution run on apple silicon☆57Updated 4 years ago
- ☆83Updated last year
- ☆164Updated last year
- Run Chinese MobileBert model on SNPE.☆15Updated 2 years ago