b4rtaz / distributed-llamaLinks
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
☆2,822Updated this week
Alternatives and similar repositories for distributed-llama
Users that are interested in distributed-llama are comparing it to the libraries listed below
Sorting:
- Distributed LLM and StableDiffusion inference for mobile, desktop and server.☆2,901Updated last year
- Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc☆2,311Updated last week
- Local AI API Platform☆2,762Updated 7 months ago
- Open-source LLM load balancer and serving platform for self-hosting LLMs at scale 🏓🦙 Alternative to projects like llm-d, Docker Model R…☆1,447Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,440Updated 2 months ago
- llama.cpp fork with additional SOTA quants and improved performance☆1,605Updated this week
- SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.☆1,803Updated last month
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆798Updated this week
- Large-scale LLM inference engine☆1,647Updated 2 weeks ago
- Fast, flexible LLM inference☆6,508Updated this week
- One command brings a complete pre-wired LLM stack with hundreds of services to explore.☆2,406Updated this week
- Llama 2 Everywhere (L2E)☆1,526Updated 5 months ago
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inference☆993Updated last month
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,121Updated 2 weeks ago
- A multi-platform desktop application to evaluate and compare LLM models, written in Rust and React.☆909Updated 2 months ago
- Big & Small LLMs working together☆1,261Updated this week
- Local realtime voice AI☆2,425Updated 2 months ago
- Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but…☆2,026Updated 3 weeks ago
- WilmerAI is one of the oldest LLM semantic routers. It uses multi-layer prompt routing and complex workflows to allow you to not only cre…☆802Updated last month
- Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?☆1,873Updated last year
- Text-To-Speech, RAG, and LLMs. All local!☆1,894Updated last year
- VS Code extension for LLM-assisted code/text completion☆1,150Updated 3 weeks ago
- AlwaysReddy is a LLM voice assistant that is always just a hotkey away.☆763Updated 11 months ago
- NVIDIA Linux open GPU with P2P support☆1,320Updated 8 months ago
- Granite Code Models: A Family of Open Foundation Models for Code Intelligence☆1,245Updated 7 months ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆615Updated 11 months ago
- Optimizing inference proxy for LLMs☆3,317Updated last week
- Proxy that allows you to use ollama as a copilot like Github copilot☆825Updated this week
- Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization☆1,389Updated last year
- RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for in…☆2,564Updated last week