Open-source LLM load balancer and serving platform for self-hosting LLMs at scale ππ¦ Alternative to projects like llm-d, Docker Model Runner, etc but with less moving parts and simple deployments built around ggml ecosystem. Runs on CPU and GPU.
β1,514Apr 3, 2026Updated last week
Alternatives and similar repositories for paddler
Users that are interested in paddler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Extracts structured data from unstructured input. Programming language agnostic. Uses llama.cppβ45May 16, 2024Updated last year
- Fast, flexible LLM inferenceβ6,928Updated this week
- Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.β2,884Feb 10, 2026Updated 2 months ago
- Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etcβ3,094Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUsβ4,493Mar 4, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- llama.cpp fork with additional SOTA quants and improved performanceβ1,961Apr 4, 2026Updated last week
- Distribute and run LLMs with a single file.β24,000Apr 2, 2026Updated last week
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β626Mar 9, 2026Updated last month
- Large-scale LLM inference engineβ1,686Mar 12, 2026Updated 3 weeks ago
- Minimalist ML framework for Rustβ19,884Apr 3, 2026Updated last week
- Go ahead and axolotl questionsβ11,608Updated this week
- Plano is an AI-native proxy and data plane for agentic apps β with built-in orchestration, safety, observability, and smart LLM routing sβ¦β6,209Updated this week
- A vector search SQLite extension that runs anywhere!β7,348Apr 1, 2026Updated last week
- [Unmaintained, see README] An ecosystem of Rust libraries for working with large language modelsβ6,152Jun 24, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- π‘ All-in-one AI framework for semantic search, LLM orchestration and language model workflowsβ12,368Updated this week
- An async actor framework for Rustβ62Apr 2, 2026Updated last week
- Structured Outputsβ13,631Mar 26, 2026Updated 2 weeks ago
- β528Updated this week
- β135Updated this week
- Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++β5,682Apr 1, 2026Updated last week
- Local AI API Platformβ2,763Jul 4, 2025Updated 9 months ago
- LLM inference in C/C++β101,475Updated this week
- Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.β14,780Apr 3, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Minimal LLM inference in Rustβ1,035Oct 24, 2024Updated last year
- The official API server for Exllama. OAI compatible, lightweight, and fast.β1,166Apr 1, 2026Updated last week
- Tools for merging pretrained large language models.β6,945Mar 15, 2026Updated 3 weeks ago
- Chat language model that can use tools and interpret the resultsβ1,595Dec 3, 2025Updated 4 months ago
- Tensor library for machine learningβ14,394Updated this week
- Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Managemeβ¦β2,355Updated this week
- Optimizing inference proxy for LLMsβ3,411Mar 19, 2026Updated 3 weeks ago
- High-level, optionally asynchronous Rust bindings to llama.cppβ245Jun 5, 2024Updated last year
- β15Apr 26, 2025Updated 11 months ago
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- Python bindings for llama.cppβ10,147Updated this week
- Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the clβ¦β30,085Updated this week
- Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data veβ¦β6,273Apr 4, 2026Updated last week
- Cleanai (https://github.com/willmil11/cleanai) except I'm making it in c now. Fast and clean from the start this time :)β17Mar 6, 2026Updated last month
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inferenceβ1,029Dec 17, 2025Updated 3 months ago
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)β855Apr 3, 2026Updated last week
- Distributed inference for mobile, desktop and server.β3,010Updated this week