b4rtaz / distributed-llamaLinks
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
☆2,713Updated last week
Alternatives and similar repositories for distributed-llama
Users that are interested in distributed-llama are comparing it to the libraries listed below
Sorting:
- Local AI API Platform☆2,760Updated 3 months ago
 - A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,353Updated 2 months ago
 - Reliable model swapping for any local OpenAI compatible server - llama.cpp, vllm, etc☆1,764Updated this week
 - Distributed LLM and StableDiffusion inference for mobile, desktop and server.☆2,887Updated last year
 - Large-scale LLM inference engine☆1,579Updated this week
 - llama.cpp fork with additional SOTA quants and improved performance☆1,277Updated this week
 - Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆727Updated last week
 - Open-source LLM load balancer and serving platform for self-hosting LLMs at scale 🏓🦙☆1,348Updated last week
 - Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but…☆1,999Updated last month
 - Blazingly fast LLM inference.☆6,171Updated last week
 - A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,902Updated 2 years ago
 - NVIDIA Linux open GPU with P2P support☆1,266Updated 4 months ago
 - Local realtime voice AI☆2,373Updated 8 months ago
 - Effortlessly run LLM backends, APIs, frontends, and services with one command.☆2,122Updated this week
 - Llama 2 Everywhere (L2E)☆1,522Updated 2 months ago
 - The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,071Updated 2 weeks ago
 - VS Code extension for LLM-assisted code/text completion☆1,028Updated this week
 - LocalAGI is a powerful, self-hostable AI Agent platform designed for maximum privacy and flexibility. A complete drop-in replacement for …☆1,268Updated last week
 - The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆602Updated 8 months ago
 - WebAssembly binding for llama.cpp - Enabling on-browser LLM inference☆924Updated 3 weeks ago
 - Replace Copilot local AI☆2,068Updated last year
 - A proxy server for multiple ollama instances with Key security☆515Updated 2 weeks ago
 - Local voice chatbot for engaging conversations, powered by Ollama, Hugging Face Transformers, and Coqui TTS Toolkit☆784Updated last year
 - Chat language model that can use tools and interpret the results☆1,586Updated this week
 - Distributed Training Over-The-Internet☆963Updated 2 weeks ago
 - the terminal client for Ollama☆2,223Updated 2 weeks ago
 - MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,386Updated 6 months ago
 - The most no-nonsense, locally or API-hosted AI code completion plugin for Visual Studio Code - like GitHub Copilot but 100% free.☆3,609Updated 2 months ago
 - Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,875Updated last year
 - Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?☆1,811Updated last year