Open-source LLM/VLM load balancer and serving platform for self-hosting LLMs (and VLMs) at scale ππ¦ Alternative to projects like llm-d, Docker Model Runner, etc but with less moving parts and simple deployments built around ggml ecosystem. Runs on CPU and GPU.
β1,588Jun 4, 2026Updated this week
Alternatives and similar repositories for paddler
Users that are interested in paddler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Extracts structured data from unstructured input. Programming language agnostic. Uses llama.cppβ45May 16, 2024Updated 2 years ago
- Fast, flexible LLM inferenceβ7,255Updated this week
- Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.β2,948Apr 14, 2026Updated last month
- Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etcβ4,441Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUsβ4,542Mar 4, 2026Updated 3 months ago
- Deploy on Railway without the complexity - Free Credits Offer β’ AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Distribute and run LLMs with a single file.β24,700Updated this week
- llama.cpp fork with additional SOTA quants and improved performanceβ2,704Updated this week
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β641Mar 9, 2026Updated 3 months ago
- Large-scale LLM inference engineβ1,762May 8, 2026Updated last month
- Minimalist ML framework for Rustβ20,426Updated this week
- Go ahead and axolotl questionsβ12,001Updated this week
- Plano is an AI-native proxy and data plane for agentic apps β with built-in orchestration, safety, observability, and smart LLM routing sβ¦β6,579Updated this week
- A vector search SQLite extension that runs anywhere!β7,702May 18, 2026Updated 3 weeks ago
- [Unmaintained, see README] An ecosystem of Rust libraries for working with large language modelsβ6,149Jun 24, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- π‘ All-in-one AI framework for semantic search, LLM orchestration and language model workflowsβ12,642Updated this week
- Structured Outputsβ13,947May 18, 2026Updated 3 weeks ago
- β136May 26, 2026Updated 2 weeks ago
- β577Jun 4, 2026Updated last week
- Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++β6,197Updated this week
- Local AI API Platformβ2,757Jul 4, 2025Updated 11 months ago
- An async actor framework for Rustβ65Apr 23, 2026Updated last month
- LLM inference in C/C++β115,667Updated this week
- Minimal LLM inference in Rustβ1,034Oct 24, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.β15,369Updated this week
- Tensor library for machine learningβ14,770May 29, 2026Updated last week
- The official API server for Exllama. OAI compatible, lightweight, and fast.β1,242Updated this week
- Chat language model that can use tools and interpret the resultsβ1,595Dec 3, 2025Updated 6 months ago
- Tools for merging pretrained large language models.β7,126May 6, 2026Updated last month
- Python bindings for llama.cppβ10,388Updated this week
- Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Managemeβ¦β2,499Updated this week
- β15Apr 26, 2025Updated last year
- Optimizing inference proxy for LLMsβ4,135May 7, 2026Updated last month
- Simple, predictable pricing with DigitalOcean hosting β’ AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- High-level, optionally asynchronous Rust bindings to llama.cppβ245Jun 5, 2024Updated 2 years ago
- Cleanai (https://github.com/willmil11/cleanai) except I'm making it in c now. Fast and clean from the start this time :)β15May 29, 2026Updated last week
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inferenceβ1,101Jun 1, 2026Updated last week
- Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the clβ¦β32,001Updated this week
- Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data veβ¦β6,609Updated this week
- Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuildβ3,582Updated this week
- Distributed inference for mobile, desktop and server.β3,076Apr 24, 2026Updated last month