BAI-Yeqi / Statistical-Properties-of-Dot-Product
☆15Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for Statistical-Properties-of-Dot-Product
- Crawl & visualize ICLR papers and reviews.☆18Updated 2 years ago
- ICLR 2022 Paper submission trend analysis from https://openreview.net/group?id=ICLR.cc/2022/Conference☆85Updated 2 years ago
- Source code for our AAAI'22 paper 《From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression》☆23Updated 2 years ago
- Code for EMNLP 2021 main conference paper "Dynamic Knowledge Distillation for Pre-trained Language Models"☆40Updated 2 years ago
- This package implements THOR: Transformer with Stochastic Experts.☆61Updated 3 years ago
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆36Updated 5 months ago
- Code for the paper "Pretrained Models for Multilingual Federated Learning" at NAACL 2022☆10Updated 2 years ago
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆79Updated last year
- Representation Surgery for Multi-Task Model Merging. ICML, 2024.☆28Updated last month
- Source code for the TMLR paper "Black-Box Prompt Learning for Pre-trained Language Models"☆55Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆44Updated last year
- A Tight-fisted Optimizer☆47Updated last year
- 😎 A simple and easy-to-use toolkit for GPU scheduling.☆42Updated 3 years ago
- Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"☆42Updated 2 years ago
- PyTorch implementation of the paper "Discovering and Explaining the Representation Bottleneck of DNNs" (ICLR 2022)☆37Updated 3 weeks ago
- Code associated with the paper **SkipBERT: Efficient Inference with Shallow Layer Skipping**, at ACL 2022☆15Updated 2 years ago
- ☆32Updated 3 years ago
- Code for paper: “What Data Benefits My Classifier?” Enhancing Model Performance and Interpretability through Influence-Based Data Selecti…☆22Updated 6 months ago
- ☆19Updated 3 years ago
- A pre-trained model with multi-exit transformer architecture.☆53Updated last year
- Mixture of Attention Heads☆39Updated 2 years ago
- [ICLR 2024]EMO: Earth Mover Distance Optimization for Auto-Regressive Language Modeling(https://arxiv.org/abs/2310.04691)☆114Updated 8 months ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆41Updated 2 years ago
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆97Updated last year
- (ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.☆21Updated 2 years ago
- [ICLR 2022] Code for paper "Exploring Extreme Parameter Compression for Pre-trained Language Models"(https://arxiv.org/abs/2205.10036)☆19Updated last year
- ☆23Updated 3 years ago
- Less is More: Task-aware Layer-wise Distillation for Language Model Compression (ICML2023)☆29Updated last year
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆15Updated 5 months ago