Lightning-AI / forked-pdbLinks

Python pdb for multiple processes

☆62

Alternatives and similar repositories for forked-pdb

Users that are interested in forked-pdb are comparing it to the libraries listed below

Sorting:

feifeibear / Odysseus-Transformer
Odysseus: Playground of LLM Sequence Parallelism
☆78Updated last year
teelinsan / parallel-decoding
Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
☆121Updated last year
alexzhang13 / flashattention2-custom-mask
Triton implementation of FlashAttention2 that adds Custom Masks.
☆151Updated last year
BBuf / flash-rwkv
☆32Updated last year
cli99 / flops-profiler
pytorch-profiler
☆51Updated 2 years ago
mgmalek / efficient_cross_entropy
☆121Updated last year
pytorch / torchdistx
Torch Distributed Experimental
☆117Updated last year
facebookresearch / fairring
Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large …
☆65Updated 3 years ago
NVIDIA / Megatron-Energon
Megatron's multi-modal data loader
☆278Updated last week
jundaf2 / INT8-Flash-Attention-FMHA-Quantization
☆159Updated 2 years ago
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆206Updated 5 months ago
ppwwyyxx / RAM-multiprocess-dataloader
Demystify RAM Usage in Multi-Process Data Loaders
☆204Updated 2 years ago
Azure / MS-AMP-Examples
Examples for MS-AMP package.
☆30Updated 4 months ago
Aleph-Alpha-Research / NeurIPS-WANT-submission-efficient-parallelization-layouts
☆22Updated last year
epfml / dynamic-sparse-flash-attention
☆150Updated 2 years ago
NVIDIA-NeMo / Megatron-Bridge
Training library for Megatron-based models
☆209Updated last week
ahennequ / pytorch-custom-mma
☆29Updated 3 years ago
fla-org / flame
🔥 A minimal training framework for scaling FLA models
☆311Updated 2 weeks ago
yanring / Megatron-MoE-ModelZoo
Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.
☆128Updated 2 weeks ago
lucidrains / triton-transformer
Implementation of a Transformer, but completely in Triton
☆277Updated 3 years ago
stanford-futuredata / stk
☆113Updated last year
berlino / gated_linear_attention
☆106Updated last year
Dao-AILab / grouped-latent-attention
☆132Updated 6 months ago
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆226Updated last year
RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆218Updated last year
mit-han-lab / patch_conv
Patch convolution to avoid large GPU memory usage of Conv2D
☆93Updated 10 months ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆217Updated last week
NVlabs / COAT
[ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training
☆250Updated 3 months ago
kssteven418 / LTP
[KDD'22] Learned Token Pruning for Transformers
☆101Updated 2 years ago
meta-pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆161Updated 2 months ago