Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
☆18Nov 11, 2024Updated last year
Alternatives and similar repositories for distributed-llama
Users that are interested in distributed-llama are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- an unofficial Georgia Tech theme for JupyterLab☆10Jun 29, 2021Updated 4 years ago
- MiniLM (BERT) embeddings from scratch☆20Aug 14, 2025Updated 7 months ago
- 🎥🎯 Tracking dart coordinates with fastai v2☆11Jan 14, 2024Updated 2 years ago
- ☆14Sep 4, 2024Updated last year
- Python tools for Anki☆23Feb 28, 2026Updated 3 weeks ago
- ☆20May 30, 2025Updated 9 months ago
- Tries to UI development. Clone of https://www.perplexity.ai/☆11Sep 30, 2023Updated 2 years ago
- Chat GPT Things by Taylor Newsome☆12Mar 19, 2024Updated 2 years ago
- Ubuntu 24.04.3 OS image builder for various RK3506 SBC☆26Mar 14, 2026Updated last week
- Play with OpenAI API's using your own API Key. Your API Key is stored and used only from your browser.☆15Dec 20, 2025Updated 3 months ago
- Bloom filter alternative (C++)☆18Nov 8, 2018Updated 7 years ago
- Unofficial Claude Code SDKs for Typescript and Python☆15May 20, 2025Updated 10 months ago
- ☆10Jun 30, 2022Updated 3 years ago
- This repository is intended as a comprehensive guide to prepare for interviews focused on generative AI. It serves as a one-stop resource…☆11Dec 13, 2024Updated last year
- TEE-hosted binaries for verifiable server-side computation.☆21Updated this week
- Lsglang is a special extension of sglang that fully utilizes CPU and GPU computing resources with an efficient GPU parallel + NUMA parall…☆43Mar 12, 2026Updated last week
- Wrapper is an open-source proxy style universal library that can help developers to set up API calls to language models of multiple provi…☆26Nov 10, 2025Updated 4 months ago
- Flask based Web application for predicting the income of a person☆13Dec 23, 2018Updated 7 years ago
- Source code for Youtube tutorial series on chest X-ray auto diagnosis☆13Sep 26, 2020Updated 5 years ago
- Deploy fastai models with Docker☆19Sep 27, 2020Updated 5 years ago
- CPU/GPU Implicit & Explicit Finite Element Solver for Large Strains☆22Feb 20, 2026Updated last month
- ☆26Nov 5, 2024Updated last year
- Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.☆93Updated this week
- Codebase for VidHal: Benchmarking Hallucinations in Vision LLMs☆14Apr 19, 2025Updated 11 months ago
- Additional functionality for use with fastai’s medical imaging module☆15Jul 20, 2022Updated 3 years ago
- ☆18Sep 7, 2025Updated 6 months ago
- My stack☆25Jul 10, 2025Updated 8 months ago
- User data script to deploy scrapyd on Amazon EC2☆22Mar 27, 2013Updated 12 years ago
- Low-latency ASR using SpeechBrain StreamingASR and torchaudio StreamReader.☆18Apr 19, 2025Updated 11 months ago
- ☆19Jul 24, 2025Updated 7 months ago
- Using Llam.cpp and onnxruntime to accelerate inference of GOT-OCR2.0☆15Mar 6, 2025Updated last year
- A minimal home grid world environment to evaluate language understanding in interactive agents.☆24Sep 6, 2023Updated 2 years ago
- Implementation of Qwen3-ASR-0.6B in GGML☆64Feb 10, 2026Updated last month
- Professional visualizations of COVID-19, emulating NYT, The Guardian, Washington Post, The Economist & others, using only Python & Altair…☆24Oct 20, 2022Updated 3 years ago
- The ntentional blog - a machine learning journey☆23Oct 20, 2022Updated 3 years ago
- Examples of Using DBTunnel☆11Apr 24, 2024Updated last year
- LLM inference in C/C++☆21Mar 16, 2026Updated last week
- ☆13Oct 16, 2024Updated last year
- a distributed end-to-end image classification system using kubernetes☆14Dec 31, 2024Updated last year