CodeLinaro / llama.cpp
LLM inference in C/C++
☆16Updated this week
Alternatives and similar repositories for llama.cpp:
Users that are interested in llama.cpp are comparing it to the libraries listed below
- mperf是一个面向移动/嵌入式平台的算子性能调优工具箱☆179Updated last year
- A tool which profiles Vulkan devices to find their peak capacities☆114Updated 6 months ago
- Detect CPU features with single-file☆385Updated this week
- ☆40Updated 2 years ago
- ☆87Updated last year
- Qualcomm Hexagon NN Offload Framework☆42Updated 4 years ago
- ☆124Updated last year
- Infere RWKV on NCNN☆48Updated 6 months ago
- Code for ACM MobiCom 2024 paper "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices"☆51Updated 2 months ago
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆77Updated 2 years ago
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆60Updated this week
- ☆28Updated 4 months ago
- The note of Qualcomm OpenCL SDK☆32Updated 6 years ago
- ☆18Updated 4 years ago
- A converter for llama2.c legacy models to ncnn models.☆87Updated last year
- Triton Compiler related materials.☆28Updated 2 months ago
- NeRF in NCNN with c++ & vulkan☆67Updated last year
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆18Updated 2 years ago
- LLM inference in C/C++☆34Updated last week
- This is a demo how to write a high performance convolution run on apple silicon☆54Updated 3 years ago
- focus on implementation of ggml-hexagon backend for Qualcomm's Hexagon NPU, details can be seen at https://github.com/zhouwg/ggml-hexagon…☆13Updated this week
- We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …☆179Updated 2 months ago
- study of cutlass☆21Updated 4 months ago
- ☆84Updated 2 years ago
- My study note for mlsys☆14Updated 4 months ago
- ☆95Updated 3 years ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆63Updated this week
- Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .☆110Updated this week
- ☆19Updated 2 years ago
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆82Updated 2 years ago