BAI-Yeqi / Statistical-Properties-of-Dot-Product
☆16Updated 3 years ago
Alternatives and similar repositories for Statistical-Properties-of-Dot-Product:
Users that are interested in Statistical-Properties-of-Dot-Product are comparing it to the libraries listed below
- [Findings of ACL 2023] Communication Efficient Federated Learning for Multilingual Machine Translation with Adapter☆12Updated last year
- Crawl & visualize ICLR papers and reviews.☆18Updated 2 years ago
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆97Updated 2 years ago
- ☆33Updated 3 years ago
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆82Updated 2 years ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆43Updated 2 years ago
- ☆14Updated last year
- Code for the ACL-2022 paper "StableMoE: Stable Routing Strategy for Mixture of Experts"☆45Updated 2 years ago
- Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》☆59Updated 3 years ago
- Ladder Side-Tuning在CLUE上的简单尝试☆19Updated 2 years ago
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆35Updated 9 months ago
- [KDD'22] Learned Token Pruning for Transformers☆96Updated 2 years ago
- Mixture of Attention Heads☆43Updated 2 years ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆48Updated 2 years ago
- Source code for the TMLR paper "Black-Box Prompt Learning for Pre-trained Language Models"☆55Updated last year
- A Tight-fisted Optimizer☆47Updated 2 years ago
- 😎 A simple and easy-to-use toolkit for GPU scheduling.☆42Updated 3 years ago
- Source code for our AAAI'22 paper 《From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression》☆23Updated 3 years ago
- (ACL-IJCNLP 2021) Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models.☆21Updated 2 years ago
- This package implements THOR: Transformer with Stochastic Experts.☆62Updated 3 years ago
- Code for paper: “What Data Benefits My Classifier?” Enhancing Model Performance and Interpretability through Influence-Based Data Selecti…☆22Updated 10 months ago
- Code for EMNLP 2021 main conference paper "Dynamic Knowledge Distillation for Pre-trained Language Models"☆40Updated 2 years ago
- [NeurIPS 2024] Code and Data Repo for Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"☆24Updated 10 months ago
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆103Updated 2 years ago
- some examples for drawing illustration plots for paper using seaborn package☆14Updated 5 years ago
- Python下shuffle几百G文件☆33Updated 3 years ago
- Code for CascadeBERT, Findings of EMNLP 2021☆12Updated 2 years ago
- 基于Transformer的单模型、多尺度的VAE模型☆55Updated 3 years ago
- Must-read papers on improving efficiency for pre-trained language models.☆103Updated 2 years ago
- ☆56Updated 2 years ago