fairydreaming / distributed-llama
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
☆14Updated 4 months ago
Alternatives and similar repositories for distributed-llama:
Users that are interested in distributed-llama are comparing it to the libraries listed below
- Fast parallel LLM inference for MLX☆177Updated 8 months ago
- Distributed Inference for mlx LLm☆87Updated 8 months ago
- 1.58-bit LLaMa model☆81Updated last year
- Train your own SOTA deductive reasoning model☆81Updated 3 weeks ago
- ☆126Updated 7 months ago
- Simple examples using Argilla tools to build AI☆52Updated 4 months ago
- 🍲Agent Chef🥘 is my robust tool for dataset refinement, structuring, and generation. By leveraging procedural and synthetic dataset gene…☆19Updated 2 months ago
- LLM inference in C/C++☆67Updated last week
- look how they massacred my boy☆63Updated 5 months ago
- Moxin is a family of fully open-source and reproducible LLMs☆85Updated 2 weeks ago
- Testing LLM reasoning abilities with family relationship quizzes.☆62Updated 2 months ago
- For inferring and serving local LLMs using the MLX framework☆99Updated last year
- automatically quant GGUF models☆164Updated this week
- ☆66Updated 10 months ago
- An efficent implementation of the method proposed in "The Era of 1-bit LLMs"☆155Updated 5 months ago
- This is the Mixture-of-Agents (MoA) concept, adapted from the original work by TogetherAI. My version is tailored for local model usage a…☆12Updated 9 months ago
- Cohere Toolkit is a collection of prebuilt components enabling users to quickly build and deploy RAG applications.☆28Updated 2 months ago
- Entropy Based Sampling and Parallel CoT Decoding☆17Updated 5 months ago
- Implementation of nougat that focuses on processing pdf locally.☆81Updated 2 months ago
- Function Calling Benchmark & Testing☆85Updated 8 months ago
- Easy to use, High Performant Knowledge Distillation for LLMs☆55Updated this week
- ☆125Updated last week
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆31Updated 10 months ago
- ☆17Updated 3 months ago
- Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)☆91Updated 2 months ago
- run ollama & gguf easily with a single command☆50Updated 10 months ago
- Gradio based tool to run opensource LLM models directly from Huggingface☆91Updated 9 months ago
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆55Updated last month
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆38Updated last month
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆63Updated 3 months ago