b4rtaz / distributed-llama
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
☆1,350Updated last month
Related projects: ⓘ
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!☆2,884Updated last month
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆3,484Updated this week
- Chat language model that can use tools and interpret the results☆1,358Updated this week
- Multiple NVIDIA GPUs or Apple Silicon for Large Language Model Inference?☆823Updated 4 months ago
- Distributed LLM and StableDiffusion inference for mobile, desktop and server.☆2,453Updated 2 weeks ago
- Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models☆2,531Updated last month
- ☆640Updated this week
- Large-scale LLM inference engine☆934Updated this week
- Blazingly fast LLM inference.☆3,406Updated this week
- Convert Compute And Books Into Instruct-Tuning Datasets (or classifiers)!☆816Updated this week
- ☆2,652Updated this week
- Run and customize Local LLMs.☆1,909Updated this week
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.☆459Updated this week
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆467Updated last month
- Replace Copilot local AI☆1,656Updated 4 months ago
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆2,080Updated this week
- g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains☆869Updated this week
- The Open Source Memory Layer For Autonomous Agents☆1,390Updated last week
- Tools for merging pretrained large language models.☆4,501Updated this week
- Llama 2 Everywhere (L2E)☆1,510Updated last month
- Llama-3 agents that can browse the web by following instructions and talking to you☆1,317Updated 2 months ago
- A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations☆646Updated this week
- [ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling☆1,378Updated 2 months ago
- SGLang is a fast serving framework for large language models and vision language models.☆5,121Updated this week
- Run Mixtral-8x7B models in Colab or consumer desktops☆2,288Updated 5 months ago
- Text-To-Speech, RAG, and LLMs. All local!☆1,534Updated 2 months ago
- A project structure aware autonomous software engineer aiming for autonomous program improvement. Resolved 30.67% tasks (pass@1) in SWE-b…☆2,637Updated this week
- WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.☆1,509Updated last month
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆929Updated 2 weeks ago
- DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence☆1,933Updated 2 months ago