Maknee / minigpt4.cpp
Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)
☆557Updated last year
Related projects ⓘ
Alternatives and complementary repositories for minigpt4.cpp
- CLIP inference in plain C/C++ with no extra dependencies☆456Updated 2 months ago
- C++ implementation for BLOOM☆811Updated last year
- ggml implementation of BERT☆464Updated 8 months ago
- ☆1,258Updated last year
- LLaVA server (llama.cpp).☆177Updated last year
- Tiny Dream - An embedded, Header Only, Stable Diffusion C++ implementation☆251Updated last year
- Suno AI's Bark model in C/C++ for fast text-to-speech☆719Updated this week
- SoTA Transformers with C-backend for fast inference on your CPU.☆312Updated 11 months ago
- GGUF implementation in C as a library and a tools CLI program☆242Updated 4 months ago
- Llama 2 Everywhere (L2E)☆1,511Updated 2 weeks ago
- Python bindings for llama.cpp☆199Updated last year
- LLM-based code completion engine☆173Updated last year
- Falcon LLM ggml framework with CPU and GPU support☆244Updated 9 months ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆229Updated 6 months ago
- Python bindings for ggml☆132Updated 2 months ago
- Stateful load balancer custom-tailored for llama.cpp☆556Updated this week
- C++ implementation for 💫StarCoder☆445Updated last year
- MiniLLM is a minimal system for running modern LLMs on consumer-grade GPUs☆866Updated last year
- Wang Yi's GPT solution☆142Updated 10 months ago
- Pure C++ implementation of several models for real-time chatting on your computer (CPU)☆374Updated this week
- throwaway GPT inference☆139Updated 5 months ago
- fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backe…☆410Updated last year
- Absolute minimalistic implementation of a GPT-like transformer using only numpy (<650 lines).☆249Updated 11 months ago
- WebGPU LLM inference tuned by hand☆146Updated last year
- A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…☆307Updated 9 months ago
- A simple "Be My Eyes" web app with a llama.cpp/llava backend☆484Updated 11 months ago
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,420Updated 3 months ago
- ☆501Updated last week
- This repository contains a pure C++ ONNX implementation of multiple offline AI models, such as StableDiffusion (1.5 and XL), ControlNet, …☆608Updated 6 months ago
- An implementation of bucketMul LLM inference☆214Updated 4 months ago