b4rtaz / distributed-llama
Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
☆1,993Updated this week
Alternatives and similar repositories for distributed-llama:
Users that are interested in distributed-llama are comparing it to the libraries listed below
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,064Updated last week
- Blazingly fast LLM inference.☆5,297Updated this week
- Large-scale LLM inference engine☆1,355Updated last week
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆732Updated last week
- Everything about the SmolLM2 and SmolVLM family of models☆2,049Updated last week
- VS Code extension for LLM-assisted code/text completion☆620Updated last week
- Local realtime voice AI☆2,264Updated 3 weeks ago
- Local AI API Platform☆2,579Updated this week
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆554Updated this week
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,842Updated last year
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆547Updated last month
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆3,746Updated 7 months ago
- Llama 2 Everywhere (L2E)☆1,516Updated 2 months ago
- Optimizing inference proxy for LLMs☆2,112Updated last week
- The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge☆1,306Updated this week
- transparent proxy server on demand model swapping for llama.cpp (or any local OpenAPI compatible server)☆475Updated this week
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆2,842Updated 3 weeks ago
- ☆838Updated 6 months ago
- a text-based terminal client for Ollama☆1,460Updated this week
- Stable Diffusion and Flux in pure C/C++☆3,949Updated 2 weeks ago
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆1,057Updated last week
- Distributed Training Over-The-Internet☆891Updated 3 months ago
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,163Updated 5 months ago
- AlwaysReddy is a LLM voice assistant that is always just a hotkey away.☆729Updated 3 weeks ago
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inference☆635Updated 2 weeks ago
- Big & Small LLMs working together☆521Updated this week
- Distributed LLM and StableDiffusion inference for mobile, desktop and server.☆2,820Updated 5 months ago
- Suno AI's Bark model in C/C++ for fast text-to-speech generation☆787Updated 4 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,861Updated 4 months ago
- Tools for merging pretrained large language models.☆5,478Updated this week