Lightning-AI / forked-pdb
Python pdb for multiple processes
☆40Updated 2 years ago
Alternatives and similar repositories for forked-pdb
Users that are interested in forked-pdb are comparing it to the libraries listed below
Sorting:
- See details in https://github.com/pytorch/xla/blob/r1.12/torch_xla/distributed/fsdp/README.md☆24Updated 2 years ago
- (Batched) advanced indexing for PyTorch.☆53Updated 4 months ago
- Implementation of Kronecker Attention in Pytorch☆19Updated 4 years ago
- ☆37Updated 2 years ago
- ☆29Updated 2 years ago
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆85Updated 2 years ago
- Code for the paper "Query-Key Normalization for Transformers"☆41Updated 4 years ago
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆116Updated last year
- This package implements THOR: Transformer with Stochastic Experts.☆62Updated 3 years ago
- Code for the paper "On the Expressivity Role of LayerNorm in Transformers' Attention" (Findings of ACL'2023)☆55Updated 7 months ago
- Using FlexAttention to compute attention with different masking patterns☆43Updated 7 months ago
- ImageNet-12k subset of ImageNet-21k (fall11)☆21Updated last year
- ☆40Updated 3 years ago
- [ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845☆120Updated 3 years ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- Sparse Backpropagation for Mixture-of-Expert Training☆29Updated 10 months ago
- A supplementary code for Editable Neural Networks, an ICLR 2020 submission.☆46Updated 5 years ago
- [ICLR 2022] "As-ViT: Auto-scaling Vision Transformers without Training" by Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wa…☆76Updated 3 years ago
- ☆30Updated 11 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆48Updated last year
- An adaptive training algorithm for residual network☆15Updated 4 years ago
- ☆41Updated 2 years ago
- ☆54Updated 10 months ago
- ☆21Updated 2 years ago
- CUDA kernels for generalized matrix-multiplication in PyTorch☆79Updated 3 years ago
- ☆17Updated last year
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆60Updated 3 years ago
- [EMNLP 2022] Official implementation of Transnormer in our EMNLP 2022 paper - The Devil in Linear Transformer☆60Updated last year
- LL3M: Large Language and Multi-Modal Model in Jax☆72Updated last year
- ☆51Updated 11 months ago