b4rtaz / distributed-llamaLinks

Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.

☆2,189

Alternatives and similar repositories for distributed-llama

Users that are interested in distributed-llama are comparing it to the libraries listed below

Sorting:

aphrodite-engine / aphrodite-engine
Large-scale LLM inference engine
☆1,457Updated last week
turboderp-org / exllamav2
A fast inference library for running LLMs locally on modern consumer-class GPUs
☆4,216Updated 3 weeks ago
menloresearch / cortex.cpp
Local AI API Platform
☆2,756Updated last week
mostlygeek / llama-swap
Model swapping for llama.cpp (or any local OpenAPI compatible server)
☆969Updated this week
turboderp / exllama
A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
☆2,883Updated last year
foldl / chatllm.cpp
Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)
☆634Updated this week
evilsocket / cake
Distributed LLM and StableDiffusion inference for mobile, desktop and server.
☆2,866Updated 8 months ago
distantmagic / paddler
Stateful load balancer custom-tailored for llama.cpp 🏓🦙
☆782Updated this week
ikawrakow / ik_llama.cpp
llama.cpp fork with additional SOTA quants and improved performance
☆608Updated this week
EricLBuehler / mistral.rs
Blazingly fast LLM inference.
☆5,764Updated this week
MeetKai / functionary
Chat language model that can use tools and interpret the results
☆1,563Updated last week
XiongjieDai / GPU-Benchmarks-on-LLM-Inference
Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?
☆1,670Updated last year
Maximilian-Winter / llama-cpp-agent
The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …
☆572Updated 4 months ago
Blaizzy / mlx-vlm
MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.
☆1,381Updated this week
theroyallab / tabbyAPI
The official API server for Exllama. OAI compatible, lightweight, and fast.
☆990Updated this week
tinygrad / open-gpu-kernel-modules
NVIDIA Linux open GPU with P2P support
☆1,175Updated 3 weeks ago
menloresearch / ichigo
Local realtime voice AI
☆2,328Updated 3 months ago
abgulati / LARS
An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.
☆598Updated 8 months ago
HazyResearch / minions
Big & Small LLMs working together
☆994Updated this week
lm-sys / RouteLLM
A framework for serving and evaluating LLM routers - save LLM costs without compromising quality
☆4,064Updated 10 months ago
guinmoon / LLMFarm
llama and other large language models on iOS and MacOS offline using GGML library.
☆1,797Updated 3 months ago
ggml-org / llama.vscode
VS Code extension for LLM-assisted code/text completion
☆814Updated last week
ngxson / wllama
WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
☆755Updated 3 weeks ago
alexpinel / Dot
Text-To-Speech, RAG, and LLMs. All local!
☆1,808Updated 6 months ago
arcee-ai / mergekit
Tools for merging pretrained large language models.
☆5,853Updated last week
bernardo-bruning / ollama-copilot
Proxy that allows you to use ollama as a copilot like Github copilot
☆693Updated last month
facebookresearch / MobileLLM
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
☆1,303Updated 2 months ago
marella / ctransformers
Python bindings for the Transformer models implemented in C/C++ using GGML library.
☆1,868Updated last year
NousResearch / Hermes-Function-Calling
☆902Updated 9 months ago
SomeOddCodeGuy / WilmerAI
What If Language Models Expertly Routed All Inference? WilmerAI allows prompts to be routed to specialized workflows based on the domain …
☆709Updated last week