Lightning-AI / forked-pdb
Python pdb for multiple processes
☆39Updated 2 years ago
Alternatives and similar repositories for forked-pdb:
Users that are interested in forked-pdb are comparing it to the libraries listed below
- ☆37Updated last year
- Code for the paper "Query-Key Normalization for Transformers"☆39Updated 4 years ago
- ☆29Updated 2 years ago
- ☆32Updated last year
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)☆18Updated last year
- An adaptive training algorithm for residual network☆15Updated 4 years ago
- Implementation of Kronecker Attention in Pytorch☆18Updated 4 years ago
- ☆21Updated 2 years ago
- ☆41Updated 2 years ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆16Updated last year
- ☆33Updated 4 years ago
- See details in https://github.com/pytorch/xla/blob/r1.12/torch_xla/distributed/fsdp/README.md☆24Updated 2 years ago
- ☆53Updated 9 months ago
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆50Updated 2 years ago
- A small repository demonstrating the use of Webdataset and Imagenet☆16Updated last year
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆84Updated 2 years ago
- ☆51Updated 10 months ago
- ☆25Updated 4 years ago
- ☆102Updated last year
- (Batched) advanced indexing for PyTorch.☆53Updated 4 months ago
- STABILIZING GRADIENTS FOR DEEP NEURAL NETWORKS VIA EFFICIENT SVD PARAMETERIZATION☆16Updated 6 years ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆47Updated last year
- Paper List for In-context Learning 🌷☆20Updated 2 years ago
- ☆17Updated last year
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆116Updated last year
- Official repository of paper "RNNs Are Not Transformers (Yet): The Key Bottleneck on In-context Retrieval"☆26Updated last year
- Code for the paper PermuteFormer☆42Updated 3 years ago
- LL3M: Large Language and Multi-Modal Model in Jax☆72Updated last year
- [ICLR 2025] Code for the paper "Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning"☆47Updated 2 months ago