marty1885 / llama.cppLinks
My develoopment fork of llama.cpp. For now working on RK3588 NPU and Tenstorrent backend
☆114Updated 2 weeks ago
Alternatives and similar repositories for llama.cpp
Users that are interested in llama.cpp are comparing it to the libraries listed below
Sorting:
- Run Large Language Models on RK3588 with GPU-acceleration☆123Updated 2 years ago
- Reverse engineering the rk3588 npu☆110Updated last year
- Efficient Inference of Transformer models☆478Updated last year
- Streaming TTS based on Piper with optional RK3588 NPU support☆121Updated 9 months ago
- top-like script for rockhip NPUs on linux☆64Updated 3 months ago
- Inference RWKV v5, v6 and v7 with Qualcomm AI Engine Direct SDK☆90Updated this week
- Easy installation and usage of Rockchip's NPUs found in RK3588 and similar SoCs☆225Updated 6 months ago
- Easier usage of LLMs in Rockchip's NPU on SBCs like Orange Pi 5 and Radxa Rock 5 series☆169Updated 6 months ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆306Updated last year
- Inference RWKV with multiple supported backends.☆77Updated this week
- AMD related optimizations for transformer models☆97Updated 3 months ago
- Explore LLM model deployment based on AXera's AI chips☆139Updated this week
- the original reference implementation of a specified llama.cpp backend for Qualcomm Hexagon NPU on Android phone, https://github.com/ggml…☆35Updated 6 months ago
- ☆172Updated this week
- ncnn benchmark on various single board computers☆163Updated 2 years ago
- Infere RWKV on NCNN☆49Updated last year
- ☆125Updated 2 years ago
- [EMNLP Findings 2024] MobileQuant: Mobile-friendly Quantization for On-device Language Models☆68Updated last year
- Because RKNPU only knows 4D☆41Updated last year
- A converter and basic tester for rwkv onnx☆43Updated 2 years ago
- High-speed and easy-use LLM serving framework for local deployment☆145Updated 6 months ago
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆74Updated last year
- An open source light-weight and high performance inference framework for Hailo devices☆162Updated this week
- Developer kits reference setup scripts for various kinds of Intel platforms and GPUs☆42Updated this week
- No-code CLI designed for accelerating ONNX workflows☆227Updated 7 months ago
- MLC Stable Diffusion for RK3588's Mali GPU☆41Updated last year
- A converter for llama2.c legacy models to ncnn models.☆79Updated 2 years ago
- Allows access via HTTP to LLM running on RK3588 NPU. Returns JSON response.☆28Updated last year
- Use safetensors with ONNX 🤗☆84Updated 3 weeks ago
- Repository of model demos using TT-Buda☆63Updated 10 months ago