distantmagic / paddlerLinks
Stateful load balancer custom-tailored for llama.cpp ππ¦
β779Updated this week
Alternatives and similar repositories for paddler
Users that are interested in paddler are comparing it to the libraries listed below
Sorting:
- Felafax is building AI infra for non-NVIDIA GPUsβ560Updated 4 months ago
- Like grep but for natural language questions. Based on Mistral 7B or Mixtral 8x7B.β380Updated last year
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β573Updated 4 months ago
- Large-scale LLM inference engineβ1,453Updated this week
- Minimal LLM inference in Rustβ994Updated 7 months ago
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.β597Updated 7 months ago
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inferenceβ750Updated 2 weeks ago
- FastMLX is a high performance production ready API to host MLX models.β308Updated 3 months ago
- Korvus is a search SDK that unifies the entire RAG pipeline in a single database query. Built on top of Postgres with bindings for Pythonβ¦β1,372Updated 4 months ago
- llama.cpp fork with additional SOTA quants and improved performanceβ584Updated this week
- Official inference library for pre-processing of Mistral modelsβ742Updated this week
- A fast batching API to serve LLM modelsβ183Updated last year
- GGUF implementation in C as a library and a tools CLI programβ272Updated 5 months ago
- Model swapping for llama.cpp (or any local OpenAPI compatible server)β961Updated this week
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.β1,347Updated 2 weeks ago
- Finetune llama2-70b and codellama on MacBook Air without quantizationβ447Updated last year
- Scalable, fast, and disk-friendly vector search in Postgres, the successor of pgvecto.rs.β848Updated 2 weeks ago
- An implementation of bucketMul LLM inferenceβ217Updated 11 months ago
- Fully neural approach for text chunkingβ357Updated last month
- Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkitβ773Updated 10 months ago
- Large Language Models (LLMs) applications and tools running on Apple Silicon in real-time with Apple MLX.β446Updated 4 months ago
- Neum AI is a best-in-class framework to manage the creation and synchronization of vector embeddings at large scale.β857Updated last year
- The official API server for Exllama. OAI compatible, lightweight, and fast.β987Updated this week
- βΎοΈ Helix is a private GenAI stack for building AI applications with declarative pipelines, knowledge (RAG), API bindings, and first-classβ¦β505Updated this week
- An extremely fast implementation of whisper optimized for Apple Silicon using MLX.β724Updated last year
- β745Updated last year
- The simplest way to build AI workloads on Postgresβ794Updated this week
- A cross-platform browser ML framework.β702Updated 6 months ago
- Run larger LLMs with longer contexts on Apple Silicon by using differentiated precision for KV cache quantization. KVSplit enables 8-bit β¦β351Updated last month
- Fast parallel LLM inference for MLXβ192Updated 11 months ago