b4rtaz / distributed-llamaLinks
Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
☆2,219Updated last week
Alternatives and similar repositories for distributed-llama
Users that are interested in distributed-llama are comparing it to the libraries listed below
Sorting:
- Large-scale LLM inference engine☆1,477Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,236Updated this week
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆661Updated this week
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆792Updated this week
- Local AI API Platform☆2,765Updated 2 weeks ago
- llama.cpp fork with additional SOTA quants and improved performance☆686Updated this week
- Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?☆1,702Updated last year
- Model swapping for llama.cpp (or any local OpenAPI compatible server)☆1,048Updated this week
- NVIDIA Linux open GPU with P2P support☆1,188Updated last month
- Blazingly fast LLM inference.☆5,890Updated this week
- Distributed LLM and StableDiffusion inference for mobile, desktop and server.☆2,876Updated 8 months ago
- VS Code extension for LLM-assisted code/text completion☆842Updated 2 weeks ago
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,000Updated last week
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,887Updated last year
- Replace Copilot local AI☆2,029Updated last year
- Distributed Training Over-The-Internet☆946Updated 2 months ago
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inference☆770Updated this week
- Llama 2 Everywhere (L2E)☆1,519Updated 6 months ago
- SCUDA is a GPU over IP bridge allowing GPUs on remote machines to be attached to CPU-only machines.☆1,744Updated last month
- prima.cpp: Speeding up 70B-scale LLM inference on low-resource everyday home clusters☆975Updated this week
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,868Updated last year
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,169Updated 9 months ago
- A proxy server for multiple ollama instances with Key security☆462Updated last week
- AlwaysReddy is a LLM voice assistant that is always just a hotkey away.☆743Updated 4 months ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆578Updated 5 months ago
- Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization☆1,329Updated 7 months ago
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.☆602Updated 8 months ago
- INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model☆1,532Updated 3 months ago
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆1,498Updated this week
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆3,309Updated last month