BlackSamorez/tensor_parallel

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/BlackSamorez/tensor_parallel)

BlackSamorez / tensor_parallel

Automatically split your PyTorch models on multiple GPUs for training & inference

☆655

Alternatives and similar repositories for tensor_parallel

Users that are interested in tensor_parallel are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pytorch / PiPPy
View on GitHub
Pipeline Parallelism for PyTorch
☆786Aug 21, 2024Updated last year
zhuzilin / ring-flash-attention
View on GitHub
Ring attention implementation with flash attention
☆1,037Sep 10, 2025Updated 10 months ago
tunib-ai / parallelformers
View on GitHub
Parallelformers: An Efficient Model Parallelization Toolkit for Deployment
☆787Apr 24, 2023Updated 3 years ago
jquesnelle / yarn
View on GitHub
YaRN: Efficient Context Window Extension of Large Language Models
☆1,737Apr 17, 2024Updated 2 years ago
HuangLK / transpeeder
View on GitHub
train llama on a single A100 80G node using 🤗 transformers and 🚀 Deepspeed Pipeline Parallelism
☆224Nov 21, 2023Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
EleutherAI / oslo
View on GitHub
OSLO: Open Source for Large-scale Optimization
☆175Sep 9, 2023Updated 2 years ago
NVIDIA / TransformerEngine
View on GitHub
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on H…
☆3,435Updated this week
NVIDIA / Megatron-LM
View on GitHub
Ongoing research training transformer models at scale
☆17,125Updated this week
goodevening13 / aquakv
View on GitHub
☆21Apr 27, 2026Updated 2 months ago
NVIDIA / FasterTransformer
View on GitHub
Transformer related optimization, including BERT, GPT
☆6,442Mar 27, 2024Updated 2 years ago
deepspeedai / DeepSpeed-MII
View on GitHub
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
☆2,108Jun 30, 2025Updated last year
facebookresearch / fairscale
View on GitHub
PyTorch extensions for high performance and large scale training.
☆3,411Apr 26, 2025Updated last year
bitsandbytes-foundation / bitsandbytes
View on GitHub
Accessible large language models via k-bit quantization for PyTorch.
☆8,333Updated this week
bigscience-workshop / Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆1,448Mar 20, 2024Updated 2 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
jzhang38 / EasyContext
View on GitHub
Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.
☆759Sep 27, 2024Updated last year
Dao-AILab / flash-attention
View on GitHub
Fast and memory-efficient exact attention
☆24,497Updated this week
CarperAI / trlx
View on GitHub
A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)
☆4,753Jan 8, 2024Updated 2 years ago
feifeibear / long-context-attention
View on GitHub
USP: Unified (a.k.a. Hybrid, 2D) Sequence Parallel Attention for Long Context Transformers Model Training and Inference
☆681May 21, 2026Updated 2 months ago
alibaba / Megatron-LLaMA
View on GitHub
Best practice for training LLaMA models in Megatron-LM
☆666Jan 2, 2024Updated 2 years ago
RulinShao / LightSeq
View on GitHub
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆223Aug 19, 2024Updated last year
learning-at-home / go-libp2p-daemon
View on GitHub
a libp2p-backed daemon wrapping the functionalities of go-libp2p for use in other languages
☆11Feb 9, 2025Updated last year
DachengLi1 / LongChat
View on GitHub
Official repository for LongChat and LongEval
☆536May 24, 2024Updated 2 years ago
huggingface / nanotron
View on GitHub
Minimalistic large language model 3D-parallelism training
☆2,755May 26, 2026Updated last month
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
huggingface / peft
View on GitHub
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
☆21,415Jul 14, 2026Updated last week
jason9693 / polyglot-finetuning-oslo
View on GitHub
☆19Sep 20, 2022Updated 3 years ago
FMInference / FlexLLMGen
View on GitHub
Running large language models on a single GPU for throughput-oriented scenarios.
☆9,359Oct 28, 2024Updated last year
FranxYao / chain-of-thought-hub
View on GitHub
Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
☆2,777Aug 4, 2024Updated last year
deepspeedai / Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆2,257Aug 14, 2025Updated 11 months ago
mit-han-lab / llm-awq
View on GitHub
[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration
☆3,592Jul 17, 2025Updated last year
huggingface / text-generation-inference
View on GitHub
Large Language Model Text Generation Inference
☆10,876Mar 21, 2026Updated 4 months ago
IST-DASLab / gptq
View on GitHub
Code for the ICLR 2023 paper "GPTQ: Accurate Post-training Quantization of Generative Pretrained Transformers".
☆2,334Mar 27, 2024Updated 2 years ago
pytorch / torchtitan
View on GitHub
A PyTorch native platform for training generative AI models
☆5,545Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hao-ai-lab / LookaheadDecoding
View on GitHub
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
☆1,340Mar 6, 2025Updated last year
GanjinZero / RRHF
View on GitHub
[NIPS2023] RRHF & Wombat
☆806Sep 22, 2023Updated 2 years ago
ELS-RD / kernl
View on GitHub
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackab…
☆1,585Jan 28, 2026Updated 5 months ago
mit-han-lab / streaming-llm
View on GitHub
[ICLR 2024] Efficient Streaming Language Models with Attention Sinks
☆7,248Jul 11, 2024Updated 2 years ago
haoliuhl / ringattention
View on GitHub
Large Context Attention
☆773Oct 13, 2025Updated 9 months ago
mit-han-lab / duo-attention
View on GitHub
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
☆539Feb 10, 2025Updated last year
qwopqwop200 / GPTQ-for-LLaMa
View on GitHub
4 bits quantization of LLaMA using GPTQ
☆3,071Jul 13, 2024Updated 2 years ago