marty1885 / llama.cpp

My develoopment fork of llama.cpp. For now working on RK3588 NPU and Tenstorrent backend

☆65

Related projects ⓘ

Alternatives and complementary repositories for llama.cpp

Chrisz236 / llm-rk3588
Run Large Language Models on RK3588 with GPU-acceleration
☆85Updated last year
mtx512 / rk3588-npu
Reverse engineering the rk3588 npu
☆63Updated 5 months ago
Pelochus / ezrknpu
Easy usage of Rockchip's NPUs found in RK3588 and similar chips
☆93Updated 4 months ago
airockchip / rknn-llm
☆402Updated this week
marty1885 / paroli
Streaming TTS based on Piper with optional RK3588 NPU support
☆43Updated last month
MollySophia / rwkv-qualcomm
Inference rwkv5 or rwkv6 with Qualcomm AI Engine Direct SDK
☆36Updated this week
usefulsensors / useful-transformers
Efficient Inference of Transformer models
☆386Updated 3 months ago
Pelochus / ezrknn-llm
Easier usage of LLMs in Rockchip's NPU on SBCs like Orange Pi 5 and Radxa Rock 5 series
☆64Updated this week
MollySophia / rwkv-ncnn
Infere RWKV on NCNN
☆47Updated 2 months ago
ramonbroox / rknputop
top-like script for rockhip NPUs on linux
☆24Updated this week
lrw04 / tinyllamas-ncnn
Inference TinyLlama models on ncnn
☆25Updated last year
RWKV / rwkv-onnx
A converter and basic tester for rwkv onnx
☆41Updated 9 months ago
daquexian / faster-rwkv
☆123Updated 10 months ago
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆86Updated this week
happyme531 / RK3588-stable-diffusion-GPU
MLC Stable Diffusion for RK3588's Mali GPU
☆37Updated last month
phhusson / rknpu-reverse-engineering
Because RKNPU only knows 4D
☆30Updated 7 months ago
lrw04 / llama2.c-to-ncnn
A converter for llama2.c legacy models to ncnn models.
☆82Updated 10 months ago
abetlen / ggml-python
Python bindings for ggml
☆132Updated 2 months ago
nihui / ncnn-small-board
ncnn benchmark on various single board computers
☆158Updated last year
tpoisonooo / llama.onnx
LLaMa/RWKV onnx models, quantization and testcase
☆350Updated last year
OpenGVLab / EfficientQAT
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
☆222Updated last month
staghado / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆229Updated 6 months ago
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆348Updated 2 months ago
EdVince / diffusers-ncnn
☆82Updated last year
PINTO0309 / whisper-onnx-cpu
ONNX implementation of Whisper. PyTorch free.
☆84Updated 2 months ago
sophgo / LLM-TPU
Run generative AI models in sophgo BM1684X
☆120Updated this week
monatis / lmm.cpp
Inference of Large Multimodal Models in C/C++. LLaVA and others
☆46Updated last year
mlc-ai / llm-perf-bench
☆114Updated 6 months ago
dusty-nv / NanoDB
Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP
☆36Updated 5 months ago