Lightning-AI / forked-pdb
Python pdb for multiple processes
☆36Updated 2 years ago
Alternatives and similar repositories for forked-pdb:
Users that are interested in forked-pdb are comparing it to the libraries listed below
- ☆29Updated 2 years ago
- ☆37Updated last year
- A small repository demonstrating the use of Webdataset and Imagenet☆15Updated last year
- Code for the paper "On the Expressivity Role of LayerNorm in Transformers' Attention" (Findings of ACL'2023)☆45Updated 4 months ago
- ImageNet-12k subset of ImageNet-21k (fall11)☆21Updated last year
- CUDA implementation of autoregressive linear attention, with all the latest research findings☆44Updated last year
- ☆32Updated 3 years ago
- ☆51Updated 7 months ago
- Code for the paper "Query-Key Normalization for Transformers"☆36Updated 3 years ago
- ☆55Updated 3 weeks ago
- ☆31Updated 8 months ago
- [ICLR2024] (EvALign-ICL Benchmark) Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context …☆21Updated 10 months ago
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)☆18Updated last year
- differentiable top-k operator☆21Updated last month
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆44Updated last year
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆80Updated last year
- Paper List for In-context Learning 🌷☆20Updated 2 years ago
- pytorch-profiler☆50Updated last year
- Implementation of "compositional attention" from MILA, a multi-head attention variant that is reframed as a two-step attention process wi…☆50Updated 2 years ago
- An adaptive training algorithm for residual network☆15Updated 4 years ago
- See details in https://github.com/pytorch/xla/blob/r1.12/torch_xla/distributed/fsdp/README.md☆23Updated 2 years ago
- ☆40Updated last year
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆114Updated 10 months ago
- Odysseus: Playground of LLM Sequence Parallelism☆64Updated 7 months ago
- ☆22Updated last month
- Sparse Backpropagation for Mixture-of-Expert Training☆27Updated 6 months ago
- Implementation of Kronecker Attention in Pytorch☆18Updated 4 years ago
- Stick-breaking attention☆41Updated 2 weeks ago
- ☆32Updated last year
- This is a PyTorch implementation of the paperViP A Differentially Private Foundation Model for Computer Vision☆36Updated last year