LIJUNYI95 / SuperAdam
Official Pytorch Implementation for the paper 'SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients'
☆17Updated 2 years ago
Related projects: ⓘ
- Implementation for NATv2.☆23Updated 3 years ago
- Code for paper 'Minimizing FLOPs to Learn Efficient Sparse Representations' published at ICLR 2020☆21Updated 4 years ago
- ☆24Updated 3 years ago
- The project is about predicting sets (of classes) from images.☆22Updated 3 years ago
- (ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.☆21Updated 2 years ago
- Codes for DATA: Differentiable ArchiTecture Approximation.☆11Updated 3 years ago
- LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)☆18Updated last year
- Directed masked autoencoders☆13Updated last year
- ☆21Updated this week
- Self-Distillation with weighted ground-truth targets; ResNet and Kernel Ridge Regression☆17Updated 2 years ago
- [AAAI'21] Modeling Deep Learning Based Privacy Attacks on Physical Mail☆12Updated 3 years ago
- An adaptive training algorithm for residual network☆14Updated 4 years ago
- Implementation of Kronecker Attention in Pytorch☆17Updated 4 years ago
- Interpolation between Residual and Non-Residual Networks, ICML 2020. https://arxiv.org/abs/2006.05749☆26Updated 4 years ago
- Code for the paper "Query-Key Normalization for Transformers"☆33Updated 3 years ago
- This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron …☆32Updated last year
- Implementation of the LOSSGRAD optimization algorithm☆15Updated 5 years ago
- [ICLR2024] (EvALign-ICL Benchmark) Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context …☆20Updated 6 months ago
- A PyTorch Dataset that caches samples in shared memory, accessible globally to all processes☆20Updated 2 years ago
- ☆23Updated 3 years ago
- ☆16Updated 3 years ago
- The official repository for our paper "The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns …☆15Updated 10 months ago
- ☆17Updated last year
- The implementation of multi-branch attentive Transformer (MAT).☆33Updated 4 years ago
- ☆14Updated this week
- Code of "Visualizing and Understanding Object Detecor"☆20Updated 3 years ago
- We investigated corruption robustness across different architectures including Convolutional Neural Networks, Vision Transformers, and th…☆15Updated 2 years ago
- ICML2019 Accepted Paper. Overcoming Multi-Model Forgetting☆13Updated 5 years ago
- Codes for Category-aware Generative Adversarial Networks (AAAI 2020)☆18Updated 4 years ago