LLaVA server (llama.cpp).
☆184Oct 20, 2023Updated 2 years ago
Alternatives and similar repositories for llava-cpp-server
Users that are interested in llava-cpp-server are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- 生成训练文本检测数据集☆12Jul 1, 2020Updated 5 years ago
- iterate quickly with llama.cpp hot reloading. use the llama.cpp bindings with bun.sh☆51Oct 30, 2023Updated 2 years ago
- A simple "Be My Eyes" web app with a llama.cpp/llava backend☆495Nov 28, 2023Updated 2 years ago
- ☆1,275Oct 24, 2023Updated 2 years ago
- CLIP inference in plain C/C++ with no extra dependencies☆560Jun 19, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Suno AI's Bark model in C/C++ for fast text-to-speech generation☆864Nov 16, 2024Updated last year
- Semantic emoji finder. Python/dash UI. Uses sentence transformer embeddings and duckdb☆20Sep 15, 2025Updated 8 months ago
- Inference of Large Multimodal Models in C/C++. LLaVA and others☆48Oct 1, 2023Updated 2 years ago
- Inference Vision Transformer (ViT) in plain C/C++ with ggml☆314Apr 11, 2024Updated 2 years ago
- This repository is a voice search demo using OpenAI Whisper, DuckDB, and the Metaphone algorithm. The associate blog post is here: https:…☆13May 15, 2024Updated 2 years ago
- The Codec 2 speech codec, compiled to WASM using Emscripten.☆13Apr 27, 2023Updated 3 years ago
- Fine-tuning, DPO, RLHF, RLAIF on LLMs - Qwen3, Zephyr 7B GPTQ with 4-Bit Quantization, Mistral-7B-GPTQ☆15Jul 5, 2025Updated 11 months ago
- Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++☆6,197Jun 7, 2026Updated last week
- Implementation of "LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models"☆40Nov 11, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- GPT-2 small trained on phi-like data☆68Feb 18, 2024Updated 2 years ago
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆47Nov 6, 2023Updated 2 years ago
- Friendly Terminal Assistant for Developers☆17Mar 23, 2024Updated 2 years ago
- Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)☆573Aug 8, 2023Updated 2 years ago
- Fine-tune mistral-7B on 3090s, a100s, h100s☆731Oct 11, 2023Updated 2 years ago
- A Javascript library (with Typescript types) to parse metadata of GGML based GGUF files.☆52Jul 30, 2024Updated last year
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆643Mar 9, 2026Updated 3 months ago
- Tensor library for machine learning☆273Apr 23, 2023Updated 3 years ago
- Python bindings for llama.cpp☆10,388Updated this week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- transformer tokenizers (e.g. BERT tokenizer) in C++ (WIP)☆18Apr 7, 2022Updated 4 years ago
- LLM-based code completion engine☆194Jan 23, 2025Updated last year
- ☆62Jun 13, 2024Updated 2 years ago
- Apache Lucene/Solr Guide☆13Oct 14, 2021Updated 4 years ago
- ☆134Nov 24, 2023Updated 2 years ago
- Extracts structured data from unstructured input. Programming language agnostic. Uses llama.cpp☆45May 16, 2024Updated 2 years ago
- Web App to transcribe memos using Whisper AI.☆18Oct 23, 2022Updated 3 years ago
- Unified realtime agent trace database & search MCP☆41Updated this week
- An implementation of Compositional Attention: Disentangling Search and Retrieval by MILA☆14Jun 1, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.☆2,951Apr 14, 2026Updated 2 months ago
- This repo is for handling Question Answering, especially for Multi-hop Question Answering☆69Dec 20, 2023Updated 2 years ago
- GGML implementation of BERT model with Python bindings and quantization.☆57Feb 19, 2024Updated 2 years ago
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,886Jan 28, 2024Updated 2 years ago
- ☆15Sep 8, 2023Updated 2 years ago
- Visual Studio Code extension for WizardCoder☆148Aug 1, 2023Updated 2 years ago
- Make any person bald!! Component of the paper: Learning to regulate 3D head shape by removing occluding hair from in-the-wild images.☆12Jun 6, 2022Updated 4 years ago