b4rtaz / distributed-llamaLinks
Distributed LLM inference. Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
☆2,688Updated last week
Alternatives and similar repositories for distributed-llama
Users that are interested in distributed-llama are comparing it to the libraries listed below
Sorting:
- Local AI API Platform☆2,760Updated 3 months ago
- Open-source LLM load balancer and serving platform for self-hosting LLMs at scale 🏓🦙☆1,323Updated this week
- Distributed LLM and StableDiffusion inference for mobile, desktop and server.☆2,882Updated 11 months ago
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,341Updated last month
- llama.cpp fork with additional SOTA quants and improved performance☆1,246Updated last week
- Model swapping for llama.cpp (or any local OpenAI API compatible server)☆1,655Updated last week
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆716Updated last week
- Blazingly fast LLM inference.☆6,141Updated this week
- Large-scale LLM inference engine☆1,562Updated this week
- Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?☆1,793Updated last year
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆595Updated 7 months ago
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆1,685Updated this week
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.☆615Updated 11 months ago
- Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but…☆1,989Updated last week
- Effortlessly run LLM backends, APIs, frontends, and services with one command.☆2,091Updated 2 weeks ago
- SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.☆1,754Updated 3 months ago
- NVIDIA Linux open GPU with P2P support☆1,260Updated 4 months ago
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inference☆908Updated last week
- Llama 2 Everywhere (L2E)☆1,523Updated last month
- Big & Small LLMs working together☆1,181Updated this week
- VS Code extension for LLM-assisted code/text completion☆988Updated this week
- the terminal client for Ollama☆2,206Updated last week
- Text-To-Speech, RAG, and LLMs. All local!☆1,830Updated 10 months ago
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,061Updated this week
- An awesome repository of local AI tools☆1,689Updated 11 months ago
- llama and other large language models on iOS and MacOS offline using GGML library.☆1,888Updated 3 weeks ago
- A vector search SQLite extension that runs anywhere!☆6,237Updated 8 months ago
- Local realtime voice AI☆2,370Updated 7 months ago
- ☆1,054Updated 4 months ago
- A proxy server for multiple ollama instances with Key security☆499Updated 2 weeks ago