Maknee / minigpt4.cpp

Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)

☆557

Related projects ⓘ

Alternatives and complementary repositories for minigpt4.cpp

monatis / clip.cpp
CLIP inference in plain C/C++ with no extra dependencies
☆456Updated 2 months ago
NouamaneTazi / bloomz.cpp
C++ implementation for BLOOM
☆811Updated last year
skeskinen / bert.cpp
ggml implementation of BERT
☆464Updated 8 months ago
YavorGIvanov / sam.cpp
☆1,258Updated last year
trzy / llava-cpp-server
LLaVA server (llama.cpp).
☆177Updated last year
symisc / tiny-dream
Tiny Dream - An embedded, Header Only, Stable Diffusion C++ implementation
☆251Updated last year
PABannier / bark.cpp
Suno AI's Bark model in C/C++ for fast text-to-speech
☆719Updated this week
NolanoOrg / cformers
SoTA Transformers with C-backend for fast inference on your CPU.
☆312Updated 11 months ago
antirez / gguf-tools
GGUF implementation in C as a library and a tools CLI program
☆242Updated 4 months ago
trholding / llama2.c
Llama 2 Everywhere (L2E)
☆1,511Updated 2 weeks ago
thomasantony / llamacpp-python
Python bindings for llama.cpp
☆199Updated last year
ggml-org / p1
LLM-based code completion engine
☆173Updated last year
cmp-nct / ggllm.cpp
Falcon LLM ggml framework with CPU and GPU support
☆244Updated 9 months ago
staghado / vit.cpp
Inference Vision Transformer (ViT) in plain C/C++ with ggml
☆229Updated 6 months ago
abetlen / ggml-python
Python bindings for ggml
☆132Updated 2 months ago
distantmagic / paddler
Stateful load balancer custom-tailored for llama.cpp
☆556Updated this week
bigcode-project / starcoder.cpp
C++ implementation for 💫StarCoder
☆445Updated last year
kuleshov / minillm
MiniLLM is a minimal system for running modern LLMs on consumer-grade GPUs
☆866Updated last year
wangyi-fudan / wyGPT
Wang Yi's GPT solution
☆142Updated 10 months ago
foldl / chatllm.cpp
Pure C++ implementation of several models for real-time chatting on your computer (CPU)
☆374Updated this week
a1k0n / a1gpt
throwaway GPT inference
☆139Updated 5 months ago
PotatoSpudowski / fastLLaMa
fastLLaMa: An experimental high-performance framework for running Decoder-only LLMs with 4-bit quantization in Python using a C/C++ backe…
☆410Updated last year
joennlae / tensorli
Absolute minimalistic implementation of a GPT-like transformer using only numpy (<650 lines).
☆249Updated 11 months ago
kayvr / token-hawk
WebGPU LLM inference tuned by hand
☆146Updated last year
harrisonvanderbyl / rwkv-cpp-accelerated
A torchless, c++ rwkv implementation using 8bit quantization, written in cuda/hip/vulkan for maximum compatibility and minimum dependenci…
☆307Updated 9 months ago
lxe / llavavision
A simple "Be My Eyes" web app with a llama.cpp/llava backend
☆484Updated 11 months ago
RWKV / rwkv.cpp
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
☆1,420Updated 3 months ago
Cornell-RelaxML / quip-sharp
☆501Updated last week
axodox / axodox-machinelearning
This repository contains a pure C++ ONNX implementation of multiple offline AI models, such as StableDiffusion (1.5 and XL), ControlNet, …
☆608Updated 6 months ago
kolinko / effort
An implementation of bucketMul LLM inference
☆214Updated 4 months ago