The-AI-Summer / pytorch-ddp
code for the ddp tutorial
☆31Updated 2 years ago
Related projects: ⓘ
- Implementation of TableFormer, Robust Transformer Modeling for Table-Text Encoding, in Pytorch☆35Updated 2 years ago
- PyTorch implementation of Soft MoE by Google Brain in "From Sparse to Soft Mixtures of Experts" (https://arxiv.org/pdf/2308.00951.pdf)☆62Updated 11 months ago
- PyTorch implementation of "From Sparse to Soft Mixtures of Experts"☆38Updated last year
- PyTorch, PyTorch Lightning framework for trying knowledge distillation in image classification problems☆30Updated last month
- Skyformer: Remodel Self-Attention with Gaussian Kernel and Nystr\"om Method (NeurIPS 2021)☆58Updated 2 years ago
- Stochastic Weight Averaging Tutorials using pytorch.☆33Updated 3 years ago
- ☆37Updated last year
- Code for the PAPA paper☆27Updated last year
- Code for the paper "Query-Key Normalization for Transformers"☆33Updated 3 years ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆42Updated last year
- ☆20Updated last year
- several types of attention modules written in PyTorch☆37Updated 4 months ago
- An implementation of Transformer with Expire-Span, a circuit for learning which memories to retain☆33Updated 3 years ago
- ☆33Updated 5 months ago
- Implementation of Online Label Smoothing in PyTorch☆94Updated last year
- PyTorch reimplementation of the Smooth ReLU activation function proposed in the paper "Real World Large Scale Recommendation Systems Repr…☆21Updated 2 years ago
- An education step by step implementation of SimCLR that accompanies the blogpost☆31Updated 2 years ago
- The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Natu…☆48Updated 3 years ago
- PyTorch implementation of moe, which stands for mixture of experts☆32Updated 3 years ago
- Adversarial examples to the new ConvNeXt architecture☆20Updated 2 years ago
- Implementation of Long-Short Transformer, combining local and global inductive biases for attention over long sequences, in Pytorch☆116Updated 3 years ago
- AutoMoE: Neural Architecture Search for Efficient Sparsely Activated Transformers☆40Updated last year
- An adaptive training algorithm for residual network☆14Updated 4 years ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆15Updated 10 months ago
- Axial Positional Embedding for Pytorch☆61Updated 3 years ago
- A simple implementation of a deep linear Pytorch module☆18Updated 3 years ago
- ☆18Updated 3 years ago
- Implementation of a Transformer using ReLA (Rectified Linear Attention) from https://arxiv.org/abs/2104.07012☆49Updated 2 years ago
- MODALS: Modality-agnostic Automated Data Augmentation in the Latent Space☆40Updated 3 years ago
- FlatNCE: A Novel Contrastive Representation Learning Objective☆83Updated 2 years ago