haozixu / htp-ops-libLinks
Self-implemented NN operators for Qualcomm's Hexagon NPU
☆34Updated 2 months ago
Alternatives and similar repositories for htp-ops-lib
Users that are interested in htp-ops-lib are comparing it to the libraries listed below
Sorting:
- ☆39Updated this week
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆19Updated 3 years ago
- This is a demo how to write a high performance convolution run on apple silicon☆57Updated 3 years ago
- ☆103Updated last year
- Benchmark code for the "Online normalizer calculation for softmax" paper☆103Updated 7 years ago
- ☆152Updated 11 months ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer☆96Updated 3 months ago
- hands on model tuning with TVM and profile it on a Mac M1, x86 CPU, and GTX-1080 GPU.☆49Updated 2 years ago
- ☆141Updated last year
- ☆168Updated 2 years ago
- ☆171Updated last week
- ☆119Updated 8 months ago
- 使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention☆78Updated last year
- This repository is a read-only mirror of https://gitlab.arm.com/kleidi/kleidiai☆106Updated last week
- ☆48Updated last year
- ☆20Updated 3 years ago
- ☆163Updated 7 months ago
- ☆47Updated 5 years ago
- play gemm with tvm☆92Updated 2 years ago
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).☆276Updated 5 months ago
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆51Updated last year
- ☆156Updated 11 months ago
- ☆38Updated 5 months ago
- An easy-to-use package for implementing SmoothQuant for LLMs☆110Updated 8 months ago
- llama INT4 cuda inference with AWQ☆55Updated 11 months ago
- how to design cpu gemm on x86 with avx256, that can beat openblas.☆73Updated 6 years ago
- ☆216Updated last year
- code reading for tvm☆76Updated 3 years ago
- Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend☆93Updated this week
- LLM inference in C/C++☆48Updated this week