Open-source LLM load balancer and serving platform for self-hosting LLMs at scale ππ¦ Alternative to projects like llm-d, Docker Model Runner, etc but with less moving parts and simple deployments built around ggml ecosystem. Runs on CPU and GPU.
β1,483Mar 19, 2026Updated this week
Alternatives and similar repositories for paddler
Users that are interested in paddler are comparing it to the libraries listed below
Sorting:
- Extracts structured data from unstructured input. Programming language agnostic. Uses llama.cppβ45May 16, 2024Updated last year
- Fast, flexible LLM inferenceβ6,713Updated this week
- Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.β2,865Feb 10, 2026Updated last month
- Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etcβ2,807Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUsβ4,468Mar 4, 2026Updated 2 weeks ago
- llama.cpp fork with additional SOTA quants and improved performanceβ1,846Updated this week
- Distribute and run LLMs with a single file.β23,794Mar 14, 2026Updated last week
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β622Mar 9, 2026Updated last week
- Large-scale LLM inference engineβ1,677Mar 12, 2026Updated last week
- Minimalist ML framework for Rustβ19,735Updated this week
- Plano is an AI-native proxy and data plane for agentic apps β with built-in orchestration, safety, observability, and smart LLM routing sβ¦β5,971Updated this week
- Go ahead and axolotl questionsβ11,460Updated this week
- A vector search SQLite extension that runs anywhere!β7,239Updated this week
- [Unmaintained, see README] An ecosystem of Rust libraries for working with large language modelsβ6,152Jun 24, 2024Updated last year
- π‘ All-in-one AI framework for semantic search, LLM orchestration and language model workflowsβ12,291Mar 14, 2026Updated last week
- An async actor framework for Rustβ61Updated this week
- β497Updated this week
- Structured Outputsβ13,564Mar 9, 2026Updated last week
- β134Mar 14, 2026Updated last week
- Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++β5,562Updated this week
- Local AI API Platformβ2,762Jul 4, 2025Updated 8 months ago
- LLM inference in C/C++β98,098Updated this week
- Minimal LLM inference in Rustβ1,033Oct 24, 2024Updated last year
- Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.β14,679Updated this week
- The official API server for Exllama. OAI compatible, lightweight, and fast.β1,154Mar 13, 2026Updated last week
- Chat language model that can use tools and interpret the resultsβ1,594Dec 3, 2025Updated 3 months ago
- Tensor library for machine learningβ14,252Updated this week
- Open source platform for AI Engineering: OpenTelemetry-native LLM Observability, GPU Monitoring, Guardrails, Evaluations, Prompt Managemeβ¦β2,296Updated this week
- Optimizing inference proxy for LLMsβ3,381Jan 28, 2026Updated last month
- High-level, optionally asynchronous Rust bindings to llama.cppβ243Jun 5, 2024Updated last year
- Tools for merging pretrained large language models.β6,867Updated this week
- β15Apr 26, 2025Updated 10 months ago
- Python bindings for llama.cppβ10,058Aug 15, 2025Updated 7 months ago
- Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data veβ¦β6,174Updated this week
- Qdrant - High-performance, massive-scale Vector Database and Vector Search Engine for the next generation of AI. Also available in the clβ¦β29,611Updated this week
- Cleanai (https://github.com/willmil11/cleanai) except I'm making it in c now. Fast and clean from the start this time :)β17Mar 6, 2026Updated 2 weeks ago
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inferenceβ1,016Dec 17, 2025Updated 3 months ago
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)β839Updated this week
- Any model. Any hardware. Zero compromise. Built with @ziglang / @openxla / MLIR / @bazelbuildβ3,258Updated this week