b4rtaz / distributed-llamaLinks
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
☆2,639Updated last week
Alternatives and similar repositories for distributed-llama
Users that are interested in distributed-llama are comparing it to the libraries listed below
Sorting:
- Local AI API Platform☆2,762Updated 2 months ago
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,315Updated last month
- Open-source LLMOps platform for hosting and scaling AI in your own infrastructure 🏓🦙☆1,305Updated 2 weeks ago
- llama.cpp fork with additional SOTA quants and improved performance☆1,181Updated this week
- Model swapping for llama.cpp (or any local OpenAI API compatible server)☆1,530Updated last week
- Local realtime voice AI☆2,363Updated 6 months ago
- Blazingly fast LLM inference.☆6,088Updated 2 weeks ago
- Large-scale LLM inference engine☆1,552Updated this week
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆704Updated last week
- Effortlessly run LLM backends, APIs, frontends, and services with one command.☆2,066Updated last week
- the terminal client for Ollama☆2,171Updated 2 weeks ago
- Distributed LLM and StableDiffusion inference for mobile, desktop and server.☆2,882Updated 11 months ago
- VS Code extension for LLM-assisted code/text completion☆967Updated this week
- SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.☆1,754Updated 3 months ago
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inference☆890Updated 3 weeks ago
- A multi-platform desktop application to evaluate and compare LLM models, written in Rust and React.☆834Updated 5 months ago
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,050Updated 3 weeks ago
- Llama 2 Everywhere (L2E)☆1,523Updated 3 weeks ago
- Optimizing inference proxy for LLMs☆2,901Updated last week
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,898Updated last year
- The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge☆1,502Updated this week
- Replace Copilot local AI☆2,056Updated last year
- Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?☆1,773Updated last year
- An awesome repository of local AI tools☆1,676Updated 10 months ago
- Simple HTML UI for Ollama☆1,083Updated 3 weeks ago
- Proxy that allows you to use ollama as a copilot like Github copilot☆761Updated last week
- The most no-nonsense, locally or API-hosted AI code completion plugin for Visual Studio Code - like GitHub Copilot but 100% free.☆3,597Updated last month
- A proxy server for multiple ollama instances with Key security☆489Updated 2 weeks ago
- Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but…☆1,985Updated 3 weeks ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆589Updated 7 months ago