b4rtaz / distributed-llamaLinks
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
☆2,761Updated last week
Alternatives and similar repositories for distributed-llama
Users that are interested in distributed-llama are comparing it to the libraries listed below
Sorting:
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,379Updated 3 months ago
- Large-scale LLM inference engine☆1,603Updated 3 weeks ago
- llama.cpp fork with additional SOTA quants and improved performance☆1,387Updated this week
- Open-source LLM load balancer and serving platform for self-hosting LLMs at scale 🏓🦙☆1,388Updated last week
- Distributed LLM and StableDiffusion inference for mobile, desktop and server.☆2,899Updated last year
- Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?☆1,848Updated last year
- Reliable model swapping for any local OpenAI/Anthropic compatible server - llama.cpp, vllm, etc☆2,025Updated last week
- Local AI API Platform☆2,764Updated 5 months ago
- Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but…☆2,011Updated 3 weeks ago
- NVIDIA Linux open GPU with P2P support☆1,294Updated 6 months ago
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆753Updated this week
- Blazingly fast LLM inference.☆6,262Updated last week
- SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.☆1,785Updated 5 months ago
- Effortlessly run LLM backends, APIs, frontends, and services with one command.☆2,183Updated this week
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,904Updated 2 years ago
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.☆620Updated last year
- VS Code extension for LLM-assisted code/text completion☆1,082Updated 3 weeks ago
- Local realtime voice AI☆2,386Updated 2 weeks ago
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,097Updated this week
- RamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for in…☆2,372Updated this week
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inference☆946Updated 2 weeks ago
- Diffusion model(SD,Flux,Wan,Qwen Image,Z-Image,...) inference in pure C/C++☆4,809Updated this week
- Text-To-Speech, RAG, and LLMs. All local!☆1,844Updated last year
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆610Updated 9 months ago
- Llama 2 Everywhere (L2E)☆1,521Updated 3 months ago
- ☆3,040Updated 3 weeks ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆3,103Updated 6 months ago
- ☆1,070Updated 6 months ago
- High-speed Large Language Model Serving for Local Deployment☆8,460Updated 4 months ago
- Everything about the SmolLM and SmolVLM family of models☆3,445Updated 3 weeks ago