fairydreaming / distributed-llamaLinks

Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.

☆17

Alternatives and similar repositories for distributed-llama

Users that are interested in distributed-llama are comparing it to the libraries listed below

Sorting:

rafacelente / bllama
1.58-bit LLaMa model
☆81Updated last year
cognitivecomputations / spectrum
☆121Updated 2 months ago
fairydreaming / farel-bench
Testing LLM reasoning abilities with family relationship quizzes.
☆62Updated 4 months ago
cognitivecomputations / kraken
☆66Updated last year
facebookresearch / matrix
Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…
☆60Updated last week
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆36Updated last year
mzbac / mlx_sharding
Distributed Inference for mlx LLm
☆92Updated 10 months ago
OpenPipe / deductive-reasoning
Train your own SOTA deductive reasoning model
☆93Updated 3 months ago
Pleias / Various-Finetuning
Set of scripts to finetune LLMs
☆37Updated last year
argilla-io / argilla-cookbook
Simple examples using Argilla tools to build AI
☆53Updated 6 months ago
astramind-ai / BitMat
An efficent implementation of the method proposed in "The Era of 1-bit LLMs"
☆154Updated 7 months ago
LLM360 / amber-data-prep
Data preparation code for Amber 7B LLM
☆91Updated last year
VITA-Group / Q-GaLore
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.
☆199Updated 10 months ago
oll4m404rc / Agent_Chef
🍲Agent Chef🥘 is my robust tool for dataset refinement, structuring, and generation. By leveraging procedural and synthetic dataset gene…
☆1Updated 4 months ago
facebookresearch / ExploreToM
Code for ExploreTom
☆83Updated 5 months ago
cognitivecomputations / dolphin-utils
☆13Updated last month
serp-ai / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks
☆31Updated last year
HishamAlyahya / semantic_backprop
Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖
☆68Updated 6 months ago
Mihaiii / backtrack_sampler
An easy-to-understand framework for LLM samplers that rewind and revise generated tokens
☆140Updated 3 months ago
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆90Updated 4 months ago
brendanhogan / picoDeepResearch
☆59Updated 2 weeks ago
willccbb / mlx_parallm
Fast parallel LLM inference for MLX
☆189Updated 11 months ago
xjdr-alt / llmri
look how they massacred my boy
☆63Updated 7 months ago
jerber / arc_agi
☆54Updated 4 months ago
N8python / mlx-pretrain
A simple MLX implementation for pretraining LLMs on Apple Silicon.
☆76Updated last month
teknium1 / ShareGPT-Builder
☆114Updated 5 months ago
Zyphra / Zamba2
PyTorch implementation of models from the Zamba2 series.
☆182Updated 4 months ago
eqimp / hogwild_llm
Official PyTorch implementation for Hogwild! Inference: Parallel LLM Generation with a Concurrent Attention Cache
☆105Updated last month
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆172Updated 4 months ago
teknium1 / LLM-Benchmark-Logs
Just a bunch of benchmark logs for different LLMs
☆119Updated 10 months ago