Open-source LLM load balancer and serving platform for self-hosting LLMs at scale ππ¦ Alternative to projects like llm-d, Docker Model Runner, etc but with less moving parts and simple deployments built around ggml ecosystem. Runs on CPU and GPU.
β1,467Updated this week
Alternatives and similar repositories for paddler
Users that are interested in paddler are comparing it to the libraries listed below
Sorting:
- Fast, flexible LLM inferenceβ6,623Updated this week
- Extracts structured data from unstructured input. Programming language agnostic. Uses llama.cppβ45May 16, 2024Updated last year
- Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.β2,842Feb 10, 2026Updated 2 weeks ago
- Delivery infrastructure for agentic apps - Plano is an AI-native proxy and data plane that offloads plumbing work, so you stay focused onβ¦β5,344Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUsβ4,444Dec 9, 2025Updated 2 months ago
- Distribute and run LLMs with a single file.β23,742Updated this week
- Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etcβ2,445Updated this week
- Minimalist ML framework for Rustβ19,509Updated this week
- Large-scale LLM inference engineβ1,658Feb 17, 2026Updated last week
- [Unmaintained, see README] An ecosystem of Rust libraries for working with large language modelsβ6,150Jun 24, 2024Updated last year
- llama.cpp fork with additional SOTA quants and improved performanceβ1,696Updated this week
- Go ahead and axolotl questionsβ11,335Updated this week
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β615Feb 17, 2025Updated last year
- A vector search SQLite extension that runs anywhere!β7,041Feb 13, 2026Updated 2 weeks ago
- π‘ All-in-one AI framework for semantic search, LLM orchestration and language model workflowsβ12,210Feb 22, 2026Updated last week
- Structured Outputsβ13,456Feb 13, 2026Updated 2 weeks ago
- Minimal LLM inference in Rustβ1,032Oct 24, 2024Updated last year
- Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Managemeβ¦β2,251Updated this week
- Local AI API Platformβ2,758Jul 4, 2025Updated 7 months ago
- Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data veβ¦β6,074Updated this week
- The official API server for Exllama. OAI compatible, lightweight, and fast.β1,134Feb 9, 2026Updated 2 weeks ago
- β468Updated this week
- Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.β14,419Feb 23, 2026Updated last week
- Tensor library for machine learningβ14,152Updated this week
- Developer-friendly OSS embedded retrieval library for multimodal AI. Search More; Manage Less.β9,141Updated this week
- Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the clβ¦β29,102Updated this week
- Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++β5,442Feb 19, 2026Updated last week
- Tools for merging pretrained large language models.β6,814Jan 26, 2026Updated last month
- Chat language model that can use tools and interpret the resultsβ1,591Dec 3, 2025Updated 2 months ago
- LLM inference in C/C++β95,726Updated this week
- Optimizing inference proxy for LLMsβ3,342Jan 28, 2026Updated last month
- Distributed LLM and StableDiffusion inference for mobile, desktop and server.β2,905Oct 23, 2024Updated last year
- A realtime serving engine for Data-Intensive Generative AI Applicationsβ1,335Updated this week
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMsβ3,728May 21, 2025Updated 9 months ago
- Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuildβ3,189Updated this week
- β134Dec 11, 2025Updated 2 months ago
- Interact with your SQL database, Natural Language to SQL using LLMsβ3,619Jul 24, 2024Updated last year
- Postgres with GPUs for ML/AI apps.β6,720Jul 1, 2025Updated 8 months ago
- Python bindings for llama.cppβ10,003Aug 15, 2025Updated 6 months ago