distantmagic / paddler
Stateful load balancer custom-tailored for llama.cpp ππ¦
β737Updated last week
Alternatives and similar repositories for paddler:
Users that are interested in paddler are comparing it to the libraries listed below
- Like grep but for natural language questions. Based on Mistral 7B or Mixtral 8x7B.β379Updated last year
- Scalable, fast, and disk-friendly vector search in Postgres, the successor of pgvecto.rs.β572Updated this week
- Model swapping for llama.cpp (or any local OpenAPI compatible server)β506Updated this week
- An implementation of bucketMul LLM inferenceβ216Updated 9 months ago
- Large-scale LLM inference engineβ1,379Updated this week
- FastMLX is a high performance production ready API to host MLX models.β288Updated 3 weeks ago
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.β577Updated 5 months ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β551Updated last month
- GGUF implementation in C as a library and a tools CLI programβ265Updated 3 months ago
- Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkitβ758Updated 8 months ago
- Felafax is building AI infra for non-NVIDIA GPUsβ558Updated 2 months ago
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inferenceβ653Updated last month
- Fast, SQL powered, in-process vector search for any language with an SQLite driverβ296Updated 5 months ago
- A fast batching API to serve LLM modelsβ183Updated 11 months ago
- 𧬠Helix is a private GenAI stack for building AI applications with declarative pipelines, knowledge (RAG), API bindings, and first-classβ¦β486Updated this week
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.β1,155Updated this week
- Replace OpenAI with Llama.cpp Automagically.β312Updated 10 months ago
- An extremely fast implementation of whisper optimized for Apple Silicon using MLX.β685Updated 11 months ago
- An OAI compatible exllamav2 API that's both lightweight and fastβ901Updated 3 weeks ago
- β711Updated 3 weeks ago
- A cross-platform browser ML framework.β683Updated 4 months ago
- β163Updated 10 months ago
- A SQLite extension for generating text embeddings from GGUF models using llama.cppβ178Updated 4 months ago
- Finetune llama2-70b and codellama on MacBook Air without quantizationβ448Updated last year
- Minimal LLM inference in Rustβ982Updated 5 months ago
- Things you can do with the token embeddings of an LLMβ1,435Updated 2 weeks ago
- Live-bending a foundation modelβs output at neural network level.β188Updated this week
- Large Language Models (LLMs) applications and tools running on Apple Silicon in real-time with Apple MLX.β435Updated 2 months ago
- Docker-based inference engine for AMD GPUsβ230Updated 6 months ago
- See Through Your Modelsβ374Updated last month