distantmagic / paddler
Stateful load balancer custom-tailored for llama.cpp ππ¦
β747Updated this week
Alternatives and similar repositories for paddler:
Users that are interested in paddler are comparing it to the libraries listed below
- Like grep but for natural language questions. Based on Mistral 7B or Mixtral 8x7B.β376Updated last year
- Fully neural approach for text chunkingβ341Updated last week
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inferenceβ684Updated last week
- Replace OpenAI with Llama.cpp Automagically.β318Updated 10 months ago
- Scalable, fast, and disk-friendly vector search in Postgres, the successor of pgvecto.rs.β655Updated last week
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β556Updated 2 months ago
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.β582Updated 6 months ago
- Model swapping for llama.cpp (or any local OpenAPI compatible server)β662Updated this week
- An implementation of bucketMul LLM inferenceβ217Updated 10 months ago
- A SQLite extension for generating text embeddings from GGUF models using llama.cppβ185Updated 5 months ago
- GGUF implementation in C as a library and a tools CLI programβ269Updated 3 months ago
- A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for varioβ¦β1,014Updated 2 months ago
- Minimal LLM inference in Rustβ983Updated 6 months ago
- VS Code extension for LLM-assisted code/text completionβ692Updated 2 weeks ago
- Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkitβ763Updated 8 months ago
- Felafax is building AI infra for non-NVIDIA GPUsβ559Updated 3 months ago
- β713Updated last month
- Fast, SQL powered, in-process vector search for any language with an SQLite driverβ299Updated 6 months ago
- Radient turns many data types (not just text) into vectors for similarity search, RAG, regression analysis, and more.β275Updated last month
- Large-scale LLM inference engineβ1,405Updated this week
- Guaranteed Structured Output from any Language Model via Hierarchical State Machinesβ125Updated this week
- Korvus is a search SDK that unifies the entire RAG pipeline in a single database query. Built on top of Postgres with bindings for Pythonβ¦β1,359Updated 3 months ago
- Live-bending a foundation modelβs output at neural network level.β247Updated 3 weeks ago
- FastMLX is a high performance production ready API to host MLX models.β295Updated last month
- A cross-platform browser ML framework.β689Updated 5 months ago
- βΎοΈ Helix is a private GenAI stack for building AI applications with declarative pipelines, knowledge (RAG), API bindings, and first-classβ¦β491Updated this week
- See Through Your Modelsβ389Updated last month
- Browser-LLM Auto-Scaling Technologyβ428Updated this week
- A fast batching API to serve LLM modelsβ182Updated last year
- Securely run AI-generated code in stateful sandboxes that run forever.β183Updated 2 weeks ago