b4rtaz / distributed-llamaLinks
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
β2,786Updated last month
Alternatives and similar repositories for distributed-llama
Users that are interested in distributed-llama are comparing it to the libraries listed below
Sorting:
- Open-source LLM load balancer and serving platform for self-hosting LLMs at scale ππ¦β1,414Updated last week
- Distributed LLM and StableDiffusion inference for mobile, desktop and server.β2,905Updated last year
- Large-scale LLM inference engineβ1,613Updated last week
- A fast inference library for running LLMs locally on modern consumer-class GPUsβ4,410Updated last month
- Local AI API Platformβ2,762Updated 6 months ago
- Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etcβ2,176Updated this week
- SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.β1,798Updated last week
- Effortlessly run LLM backends, APIs, frontends, and services with one command.β2,239Updated last week
- Blazingly fast LLM inference.β6,340Updated last week
- Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?β1,865Updated last year
- Optimizing inference proxy for LLMsβ3,274Updated 3 weeks ago
- llama.cpp fork with additional SOTA quants and improved performanceβ1,494Updated this week
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)β763Updated this week
- VS Code extension for LLM-assisted code/text completionβ1,124Updated last week
- Replace Copilot local AIβ2,081Updated last year
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM β¦β609Updated 10 months ago
- The official API server for Exllama. OAI compatible, lightweight, and fast.β1,110Updated 3 weeks ago
- NVIDIA Linux open GPU with P2P supportβ1,310Updated 7 months ago
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inferenceβ970Updated 3 weeks ago
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.β624Updated last year
- A proxy server for multiple ollama instances with Key securityβ561Updated 2 months ago
- The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edgeβ1,576Updated 3 weeks ago
- An awesome repository of local AI toolsβ1,801Updated last year
- Local realtime voice AIβ2,422Updated last month
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.β2,905Updated 2 years ago
- WilmerAI is one of the oldest LLM semantic routers. It uses multi-layer prompt routing and complex workflows to allow you to not only creβ¦β795Updated last week
- Llama 2 Everywhere (L2E)β1,526Updated 4 months ago
- Compare open-source local LLM inference projects by their metrics to assess popularity and activeness.β701Updated 2 months ago
- Text-To-Speech, RAG, and LLMs. All local!β1,893Updated last year
- Simple go utility to download HuggingFace Models and Datasetsβ802Updated 2 weeks ago