Open-source LLM/VLM load balancer and serving platform for self-hosting LLMs (and VLMs) at scale ππ¦ Alternative to projects like llm-d, Docker Model Runner, etc but with less moving parts and simple deployments built around ggml ecosystem. Runs on CPU and GPU.
β1,540Apr 28, 2026Updated this week
Alternatives and similar repositories for paddler
Users that are interested in paddler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Extracts structured data from unstructured input. Programming language agnostic. Uses llama.cppβ45May 16, 2024Updated last year
- Fast, flexible LLM inferenceβ7,074Apr 15, 2026Updated 2 weeks ago
- Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.β2,908Apr 14, 2026Updated 2 weeks ago
- Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etcβ3,641Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUsβ4,511Mar 4, 2026Updated last month
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- llama.cpp fork with additional SOTA quants and improved performanceβ2,194Apr 24, 2026Updated last week
- Distribute and run LLMs with a single file.β24,274Apr 23, 2026Updated last week
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β630Mar 9, 2026Updated last month
- Large-scale LLM inference engineβ1,714Updated this week
- Minimalist ML framework for Rustβ20,082Apr 23, 2026Updated last week
- Go ahead and axolotl questionsβ11,779Updated this week
- Plano is an AI-native proxy and data plane for agentic apps β with built-in orchestration, safety, observability, and smart LLM routing sβ¦β6,390Updated this week
- A vector search SQLite extension that runs anywhere!β7,483Apr 8, 2026Updated 3 weeks ago
- [Unmaintained, see README] An ecosystem of Rust libraries for working with large language modelsβ6,152Jun 24, 2024Updated last year
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- π‘ All-in-one AI framework for semantic search, LLM orchestration and language model workflowsβ12,424Apr 21, 2026Updated last week
- Structured Outputsβ13,741Apr 16, 2026Updated 2 weeks ago
- β550Updated this week
- β135Apr 8, 2026Updated 3 weeks ago
- Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++β5,833Apr 23, 2026Updated last week
- Local AI API Platformβ2,761Jul 4, 2025Updated 9 months ago
- An async actor framework for Rustβ63Apr 23, 2026Updated last week
- LLM inference in C/C++β106,639Updated this week
- Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.β14,938Updated this week
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Minimal LLM inference in Rustβ1,036Oct 24, 2024Updated last year
- The official API server for Exllama. OAI compatible, lightweight, and fast.β1,197Updated this week
- Chat language model that can use tools and interpret the resultsβ1,594Dec 3, 2025Updated 4 months ago
- Tools for merging pretrained large language models.β7,023Mar 15, 2026Updated last month
- Tensor library for machine learningβ14,560Updated this week
- Optimizing inference proxy for LLMsβ3,440Mar 19, 2026Updated last month
- Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Managemeβ¦β2,394Apr 23, 2026Updated last week
- High-level, optionally asynchronous Rust bindings to llama.cppβ246Jun 5, 2024Updated last year
- β15Apr 26, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits β’ AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Python bindings for llama.cppβ10,240Updated this week
- Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the clβ¦β30,799Updated this week
- Cleanai (https://github.com/willmil11/cleanai) except I'm making it in c now. Fast and clean from the start this time :)β16Mar 6, 2026Updated last month
- Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data veβ¦β6,363Updated this week
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inferenceβ1,048Updated this week
- Official Rust Implementation of Model2Vecβ176Apr 10, 2026Updated 3 weeks ago
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)β868Apr 3, 2026Updated 3 weeks ago