distantmagic / paddlerLinks
Stateful load balancer custom-tailored for llama.cpp ππ¦
β790Updated this week
Alternatives and similar repositories for paddler
Users that are interested in paddler are comparing it to the libraries listed below
Sorting:
- Like grep but for natural language questions. Based on Mistral 7B or Mixtral 8x7B.β381Updated last year
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inferenceβ762Updated last month
- Felafax is building AI infra for non-NVIDIA GPUsβ566Updated 5 months ago
- Minimal LLM inference in Rustβ1,002Updated 8 months ago
- Fully neural approach for text chunkingβ363Updated 2 months ago
- An implementation of bucketMul LLM inferenceβ220Updated last year
- βΎοΈ Helix is a private GenAI stack for building AI agents with declarative pipelines, knowledge (RAG), API bindings, and first-class testiβ¦β507Updated this week
- Finetune llama2-70b and codellama on MacBook Air without quantizationβ446Updated last year
- A realtime serving engine for Data-Intensive Generative AI Applicationsβ1,028Updated this week
- GGUF implementation in C as a library and a tools CLI programβ274Updated 6 months ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β575Updated 4 months ago
- VS Code extension for LLM-assisted code/text completionβ835Updated last week
- LLM-powered lossless compression toolβ283Updated 10 months ago
- Replace OpenAI with Llama.cpp Automagically.β320Updated last year
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.β600Updated 8 months ago
- Super-fast Structured Outputsβ330Updated last week
- Large-scale LLM inference engineβ1,471Updated this week
- See Through Your Modelsβ398Updated this week
- Fast, SQL powered, in-process vector search for any language with an SQLite driverβ314Updated 8 months ago
- A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for varioβ¦β1,018Updated 4 months ago
- Official inference library for pre-processing of Mistral modelsβ755Updated this week
- β363Updated this week
- A fast batching API to serve LLM modelsβ183Updated last year
- π Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and moβ¦β1,064Updated 6 months ago
- High-Performance Implementation of OpenAI's TikToken.β416Updated last week
- llama.cpp fork with additional SOTA quants and improved performanceβ652Updated this week
- Model swapping for llama.cpp (or any local OpenAPI compatible server)β1,010Updated last week
- Korvus is a search SDK that unifies the entire RAG pipeline in a single database query. Built on top of Postgres with bindings for Pythonβ¦β1,376Updated 5 months ago
- A cross-platform browser ML framework.β708Updated 7 months ago
- Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.β859Updated last year