Open-source LLM/VLM load balancer and serving platform for self-hosting LLMs (and VLMs) at scale ππ¦ Alternative to projects like llm-d, Docker Model Runner, etc but with less moving parts and simple deployments built around ggml ecosystem. Runs on CPU and GPU.
β1,567May 19, 2026Updated this week
Alternatives and similar repositories for paddler
Users that are interested in paddler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Extracts structured data from unstructured input. Programming language agnostic. Uses llama.cppβ45May 16, 2024Updated 2 years ago
- Fast, flexible LLM inferenceβ7,130Apr 15, 2026Updated last month
- Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.β2,931Apr 14, 2026Updated last month
- Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etcβ4,107Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUsβ4,521Mar 4, 2026Updated 2 months ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- llama.cpp fork with additional SOTA quants and improved performanceβ2,448Updated this week
- Distribute and run LLMs with a single file.β24,451May 14, 2026Updated last week
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β632Mar 9, 2026Updated 2 months ago
- Large-scale LLM inference engineβ1,736May 8, 2026Updated last week
- Minimalist ML framework for Rustβ20,261Updated this week
- Go ahead and axolotl questionsβ11,938Updated this week
- Plano is an AI-native proxy and data plane for agentic apps β with built-in orchestration, safety, observability, and smart LLM routing sβ¦β6,483Updated this week
- A vector search SQLite extension that runs anywhere!β7,586Apr 8, 2026Updated last month
- [Unmaintained, see README] An ecosystem of Rust libraries for working with large language modelsβ6,148Jun 24, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer β’ AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- π‘ All-in-one AI framework for semantic search, LLM orchestration and language model workflowsβ12,577May 12, 2026Updated last week
- Structured Outputsβ13,846May 13, 2026Updated last week
- β567Updated this week
- β136May 3, 2026Updated 2 weeks ago
- Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++β6,013May 14, 2026Updated last week
- Local AI API Platformβ2,755Jul 4, 2025Updated 10 months ago
- An async actor framework for Rustβ65Apr 23, 2026Updated 3 weeks ago
- LLM inference in C/C++β110,506Updated this week
- Minimal LLM inference in Rustβ1,034Oct 24, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.β15,136Updated this week
- The official API server for Exllama. OAI compatible, lightweight, and fast.β1,219May 14, 2026Updated last week
- Chat language model that can use tools and interpret the resultsβ1,594Dec 3, 2025Updated 5 months ago
- Tools for merging pretrained large language models.β7,083May 6, 2026Updated 2 weeks ago
- Tensor library for machine learningβ14,645May 14, 2026Updated last week
- Python bindings for llama.cppβ10,312Updated this week
- Optimizing inference proxy for LLMsβ3,856May 7, 2026Updated 2 weeks ago
- High-level, optionally asynchronous Rust bindings to llama.cppβ246Jun 5, 2024Updated last year
- Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Managemeβ¦β2,439May 13, 2026Updated last week
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- β15Apr 26, 2025Updated last year
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inferenceβ1,072Updated this week
- Cleanai (https://github.com/willmil11/cleanai) except I'm making it in c now. Fast and clean from the start this time :)β16Mar 6, 2026Updated 2 months ago
- Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data veβ¦β6,431Updated this week
- Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the clβ¦β31,401Updated this week
- Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuildβ3,532Updated this week
- Distributed inference for mobile, desktop and server.β3,032Apr 24, 2026Updated 3 weeks ago