intentee / paddlerLinks
Open-source LLM load balancer and serving platform for self-hosting LLMs at scale ππ¦
β1,337Updated 2 weeks ago
Alternatives and similar repositories for paddler
Users that are interested in paddler are comparing it to the libraries listed below
Sorting:
- Minimal LLM inference in Rustβ1,013Updated last year
- Like grep but for natural language questions. Based on Mistral 7B or Mixtral 8x7B.β386Updated last year
- A cross-platform browser ML framework.β718Updated 11 months ago
- A high-performance inference engine for AI modelsβ1,343Updated this week
- VS Code extension for LLM-assisted code/text completionβ1,001Updated last week
- Scalable, fast, and disk-friendly vector search in Postgres, the successor of pgvecto.rs.β1,184Updated last week
- SeekStorm - sub-millisecond full-text search library & multi-tenancy server in Rustβ1,746Updated last week
- Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.β496Updated this week
- Super-fast Structured Outputsβ561Updated last week
- Reliable model swapping for any local OpenAI compatible server - llama.cpp, vllm, etcβ1,730Updated this week
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inferenceβ912Updated 2 weeks ago
- Fully neural approach for text chunkingβ378Updated this week
- Felafax is building AI infra for non-NVIDIA GPUsβ568Updated 9 months ago
- Big & Small LLMs working togetherβ1,184Updated this week
- Large-scale LLM inference engineβ1,567Updated 2 weeks ago
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.β1,748Updated this week
- The Easiest Rust Interface for Local LLMs and an Interface for Deterministic Signals from Probabilistic LLM Vibesβ236Updated 2 months ago
- A realtime serving engine for Data-Intensive Generative AI Applicationsβ1,060Updated last week
- git-like rag pipelineβ246Updated 2 weeks ago
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.β617Updated 11 months ago
- A multi-platform desktop application to evaluate and compare LLM models, written in Rust and React.β847Updated 6 months ago
- Highly Performant, Modular, Memory Safe and Production-ready Inference, Ingestion and Indexing built in Rust π¦β742Updated last week
- Korvus is a search SDK that unifies the entire RAG pipeline in a single database query. Built on top of Postgres with bindings for Pythonβ¦β1,452Updated 8 months ago
- Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rβ¦β508Updated this week
- βΎοΈ Helix is a private GenAI stack for building AI agents with declarative pipelines, knowledge (RAG), API bindings, and first-class testiβ¦β521Updated this week
- Rust library for generating vector embeddings, reranking. Re-write of qdrant/fastembed.β634Updated last month
- InferX: Inference as a Service Platformβ136Updated this week
- MLX Omni Server is a local inference server powered by Apple's MLX framework, specifically designed for Apple Silicon (M-series) chips. Iβ¦β583Updated last week
- High-Performance Implementation of OpenAI's TikToken.β458Updated 3 months ago
- Blazingly fast LLM inference.β6,149Updated this week