b4rtaz / distributed-llama
Connect home devices into a powerful cluster to accelerate LLM inference. More devices means faster inference.
☆2,059Updated 2 weeks ago
Alternatives and similar repositories for distributed-llama
Users that are interested in distributed-llama are comparing it to the libraries listed below
Sorting:
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,171Updated this week
- Optimizing inference proxy for LLMs☆2,220Updated this week
- Llama 2 Everywhere (L2E)☆1,517Updated 4 months ago
- Chat language model that can use tools and interpret the results☆1,556Updated last week
- Stateful load balancer custom-tailored for llama.cpp 🏓🦙☆755Updated 2 weeks ago
- A framework for serving and evaluating LLM routers - save LLM costs without compromising quality☆3,935Updated 9 months ago
- ☆2,939Updated 8 months ago
- Local AI API Platform☆2,655Updated last week
- Blazingly fast LLM inference.☆5,601Updated this week
- Large-scale LLM inference engine☆1,419Updated this week
- Go ahead and axolotl questions☆9,336Updated this week
- Model swapping for llama.cpp (or any local OpenAPI compatible server)☆745Updated this week
- Pure C++ implementation of several models for real-time chatting on your computer (CPU & GPU)☆588Updated this week
- The easiest & fastest way to run customized and fine-tuned LLMs locally or on the edge☆1,384Updated this week
- Lightweight inference library for ONNX files, written in C++. It can run Stable Diffusion XL 1.0 on a RPI Zero 2 (or in 298MB of RAM) but…☆1,940Updated 2 weeks ago
- The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). Allowing users to chat with LLM …☆562Updated 3 months ago
- LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve spee…☆2,918Updated 3 weeks ago
- Compare open-source local LLM inference projects by their metrics to assess popularity and activeness.☆562Updated this week
- the terminal client for Ollama☆1,832Updated this week
- Tools for merging pretrained large language models.☆5,646Updated last week
- An application for running LLMs locally on your device, with your documents, facilitating detailed citations in generated responses.☆589Updated 6 months ago
- WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.☆1,443Updated 2 weeks ago
- Ingest files for retrieval augmented generation (RAG) with open-source Large Language Models (LLMs), all without 3rd parties or sensitive…☆644Updated 9 months ago
- A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.☆2,873Updated last year
- PyTorch native post-training library☆5,171Updated this week
- Local realtime voice AI☆2,290Updated 2 months ago
- VS Code extension for LLM-assisted code/text completion☆734Updated this week
- llama.cpp fork with additional SOTA quants and improved performance☆473Updated this week
- MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.☆1,295Updated 3 weeks ago
- Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization☆1,304Updated 5 months ago