Lightning-AI / forked-pdbLinks
Python pdb for multiple processes
☆45Updated last week
Alternatives and similar repositories for forked-pdb
Users that are interested in forked-pdb are comparing it to the libraries listed below
Sorting:
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆116Updated last year
- Sparse Backpropagation for Mixture-of-Expert Training☆29Updated 11 months ago
- ☆29Updated 2 years ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated 2 years ago
- See details in https://github.com/pytorch/xla/blob/r1.12/torch_xla/distributed/fsdp/README.md☆24Updated 2 years ago
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆85Updated 2 years ago
- LL3M: Large Language and Multi-Modal Model in Jax☆72Updated last year
- ☆37Updated 2 years ago
- Code for the paper "Query-Key Normalization for Transformers"☆41Updated 4 years ago
- [ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845☆120Updated 3 years ago
- Fast Discounted Cumulative Sums in PyTorch☆96Updated 3 years ago
- ☆54Updated 10 months ago
- (Batched) advanced indexing for PyTorch.☆53Updated 5 months ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- This package implements THOR: Transformer with Stochastic Experts.☆63Updated 3 years ago
- ☆33Updated 4 years ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆48Updated last year
- Stick-breaking attention☆56Updated 2 months ago
- ☆20Updated last year
- ☆51Updated 11 months ago
- ☆31Updated last year
- ☆24Updated 3 months ago
- Best practices for testing advanced Mixtral, DeepSeek, and Qwen series MoE models using Megatron Core MoE.☆17Updated this week
- A minimal PyTorch Lightning OpenAI GPT w DeepSpeed Training!☆111Updated 2 years ago
- ☆104Updated last year
- ☆22Updated last year
- This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron …☆32Updated last year
- Triton implementation of FlashAttention2 that adds Custom Masks.☆117Updated 9 months ago
- Implementation of Kronecker Attention in Pytorch☆19Updated 4 years ago
- pytorch-profiler☆51Updated 2 years ago