trzy / llava-cpp-server
LLaVA server (llama.cpp).
☆177Updated last year
Related projects ⓘ
Alternatives and complementary repositories for llava-cpp-server
- Inference of Large Multimodal Models in C/C++. LLaVA and others☆46Updated last year
- Local ML voice chat using high-end models.☆145Updated this week
- Extend the original llama.cpp repo to support redpajama model.☆117Updated 2 months ago
- Python bindings for ggml☆132Updated 2 months ago
- WebGPU LLM inference tuned by hand☆147Updated last year
- Demo python script app to interact with llama.cpp server using whisper API, microphone and webcam devices.☆45Updated last year
- Full finetuning of large language models without large memory requirements☆93Updated 10 months ago
- run paligemma in real time☆122Updated 6 months ago
- automatically quant GGUF models☆140Updated this week
- A fast batching API to serve LLM models☆172Updated 6 months ago
- ☆39Updated 9 months ago
- Landmark Attention: Random-Access Infinite Context Length for Transformers QLoRA☆124Updated last year
- Video+code lecture on building nanoGPT from scratch☆64Updated 5 months ago
- This is our own implementation of 'Layer Selective Rank Reduction'☆232Updated 5 months ago
- Scripts to create your own moe models using mlx☆86Updated 8 months ago
- ☆104Updated 8 months ago
- GRDN.AI app for garden optimization☆69Updated 9 months ago
- Port of Suno AI's Bark in C/C++ for fast inference☆54Updated 7 months ago
- ☆149Updated 4 months ago
- Mistral7B playing DOOM☆122Updated 4 months ago
- Command-line script for inferencing from models such as MPT-7B-Chat☆102Updated last year
- ☆136Updated 11 months ago
- Run GGML models with Kubernetes.☆173Updated 11 months ago
- A ggml (C++) re-implementation of tortoise-tts☆159Updated 3 months ago
- Port of Microsoft's BioGPT in C/C++ using ggml☆87Updated 9 months ago
- CLIP inference in plain C/C++ with no extra dependencies☆459Updated 3 months ago
- inference code for mixtral-8x7b-32kseqlen☆98Updated 11 months ago
- Fast parallel LLM inference for MLX☆149Updated 4 months ago
- Low-Rank adapter extraction for fine-tuned transformers model☆162Updated 6 months ago
- ☆38Updated 8 months ago