marty1885 / llama.cpp
My develoopment fork of llama.cpp. For now working on RK3588 NPU and Tenstorrent backend
☆65Updated last week
Related projects ⓘ
Alternatives and complementary repositories for llama.cpp
- Run Large Language Models on RK3588 with GPU-acceleration☆85Updated last year
- Reverse engineering the rk3588 npu☆63Updated 5 months ago
- Easy usage of Rockchip's NPUs found in RK3588 and similar chips☆93Updated 4 months ago
- ☆402Updated this week
- Streaming TTS based on Piper with optional RK3588 NPU support☆43Updated last month
- Inference rwkv5 or rwkv6 with Qualcomm AI Engine Direct SDK☆36Updated this week
- Efficient Inference of Transformer models☆386Updated 3 months ago
- Easier usage of LLMs in Rockchip's NPU on SBCs like Orange Pi 5 and Radxa Rock 5 series☆64Updated this week
- Infere RWKV on NCNN☆47Updated 2 months ago
- top-like script for rockhip NPUs on linux☆24Updated this week
- Inference TinyLlama models on ncnn☆25Updated last year
- A converter and basic tester for rwkv onnx☆41Updated 9 months ago
- ☆123Updated 10 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆86Updated this week
- MLC Stable Diffusion for RK3588's Mali GPU☆37Updated last month
- Because RKNPU only knows 4D☆30Updated 7 months ago
- A converter for llama2.c legacy models to ncnn models.☆82Updated 10 months ago
- Python bindings for ggml☆132Updated 2 months ago
- ncnn benchmark on various single board computers☆158Updated last year
- LLaMa/RWKV onnx models, quantization and testcase☆350Updated last year
- EfficientQAT: Efficient Quantization-Aware Training for Large Language Models☆222Updated last month
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆229Updated 6 months ago
- An innovative library for efficient LLM inference via low-bit quantization☆348Updated 2 months ago
- ☆82Updated last year
- ONNX implementation of Whisper. PyTorch free.☆84Updated 2 months ago
- Run generative AI models in sophgo BM1684X☆120Updated this week
- Inference of Large Multimodal Models in C/C++. LLaVA and others☆46Updated last year
- ☆114Updated 6 months ago
- Zero-copy multimodal vector DB with CUDA and CLIP/SigLIP☆36Updated 5 months ago