distantmagic / paddlerLinks

Stateful load balancer custom-tailored for llama.cpp 🏓🦙

☆790

Alternatives and similar repositories for paddler

Users that are interested in paddler are comparing it to the libraries listed below

Sorting:

moritztng / fltr
Like grep but for natural language questions. Based on Mistral 7B or Mixtral 8x7B.
☆381Updated last year
ngxson / wllama
WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
☆762Updated last month
felafax / felafax
Felafax is building AI infra for non-NVIDIA GPUs
☆566Updated 5 months ago
samuel-vitorino / lm.rs
Minimal LLM inference in Rust
☆1,002Updated 8 months ago
mirth / chonky
Fully neural approach for text chunking
☆363Updated 2 months ago
kolinko / effort
An implementation of bucketMul LLM inference
☆220Updated last year
helixml / helix
♾️ Helix is a private GenAI stack for building AI agents with declarative pipelines, knowledge (RAG), API bindings, and first-class testi…
☆507Updated this week
okuvshynov / slowllama
Finetune llama2-70b and codellama on MacBook Air without quantization
☆446Updated last year
tensorlakeai / indexify
A realtime serving engine for Data-Intensive Generative AI Applications
☆1,028Updated this week
antirez / gguf-tools
GGUF implementation in C as a library and a tools CLI program
☆274Updated 6 months ago
Maximilian-Winter / llama-cpp-agent
The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …
☆575Updated 4 months ago
ggml-org / llama.vscode
VS Code extension for LLM-assisted code/text completion
☆835Updated last week
AlexBuz / llama-zip
LLM-powered lossless compression tool
☆283Updated 10 months ago
pinokiocomputer / llamanet
Replace OpenAI with Llama.cpp Automagically.
☆320Updated last year
abgulati / LARS
An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.
☆600Updated 8 months ago
guidance-ai / llguidance
Super-fast Structured Outputs
☆330Updated last week
aphrodite-engine / aphrodite-engine
Large-scale LLM inference engine
☆1,471Updated this week
klara-research / klarity
See Through Your Models
☆398Updated this week
1yefuwang1 / vectorlite
Fast, SQL powered, in-process vector search for any language with an SQLite driver
☆314Updated 8 months ago
Dicklesworthstone / swiss_army_llama
A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for vario…
☆1,018Updated 4 months ago
mistralai / mistral-common
Official inference library for pre-processing of Mistral models
☆755Updated this week
ScalingIntelligence / tokasaurus
☆363Updated this week
epolewski / EricLLM
A fast batching API to serve LLM models
☆183Updated last year
bricks-cloud / BricksLLM
🔒 Enterprise-grade API gateway that helps you monitor and impose cost or rate limits per API key. Get fine-grained access control and mo…
☆1,064Updated 6 months ago
M4THYOU / TokenDagger
High-Performance Implementation of OpenAI's TikToken.
☆416Updated last week
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆652Updated this week
mostlygeek / llama-swap
Model swapping for llama.cpp (or any local OpenAPI compatible server)
☆1,010Updated last week
postgresml / korvus
Korvus is a search SDK that unifies the entire RAG pipeline in a single database query. Built on top of Postgres with bindings for Python…
☆1,376Updated 5 months ago
huggingface / ratchet
A cross-platform browser ML framework.
☆708Updated 7 months ago
NeumTry / NeumAI
Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.
☆859Updated last year