b4rtaz / distributed-llamaLinks
Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
☆2,189Updated last week
Alternatives and similar repositories for distributed-llama
Users that are interested in distributed-llama are comparing it to the libraries listed below
Sorting:
- Large-scale LLM inference engine☆1,457Updated last week
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,216Updated 3 weeks ago
- Local AI API Platform☆2,756Updated last week
- Model swapping for llama.cpp (or any local OpenAPI compatible server)☆969Updated this week
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,883Updated last year
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆634Updated this week
- Distributed LLM and StableDiffusion inference for mobile, desktop and server.☆2,866Updated 8 months ago
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆782Updated this week
- llama.cpp fork with additional SOTA quants and improved performance☆608Updated this week
- Blazingly fast LLM inference.☆5,764Updated this week
- Chat language model that can use tools and interpret the results☆1,563Updated last week
- Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?☆1,670Updated last year
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆572Updated 4 months ago
- MLX-VLM is a package for inference and fine-tuning of Vision Language Models (VLMs) on your Mac using MLX.☆1,381Updated this week
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆990Updated this week
- NVIDIA Linux open GPU with P2P support☆1,175Updated 3 weeks ago
- Local realtime voice AI☆2,328Updated 3 months ago
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.☆598Updated 8 months ago
- Big & Small LLMs working together☆994Updated this week
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆4,064Updated 10 months ago
- llama and other large language models on iOS and MacOS offline using GGML library.☆1,797Updated 3 months ago
- VS Code extension for LLM-assisted code/text completion☆814Updated last week
- WebAssembly binding for llama.cpp - Enabling on-browser LLM inference☆755Updated 3 weeks ago
- Text-To-Speech, RAG, and LLMs. All local!☆1,808Updated 6 months ago
- Tools for merging pretrained large language models.☆5,853Updated last week
- Proxy that allows you to use ollama as a copilot like Github copilot☆693Updated last month
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,303Updated 2 months ago
- Python bindings for the Transformer models implemented in C/C++ using GGML library.☆1,868Updated last year
- ☆902Updated 9 months ago
- What If Language Models Expertly Routed All Inference? WilmerAI allows prompts to be routed to specialized workflows based on the domain …☆709Updated last week