BAI-Yeqi / Statistical-Properties-of-Dot-ProductLinks
☆16Updated 3 years ago
Alternatives and similar repositories for Statistical-Properties-of-Dot-Product
Users that are interested in Statistical-Properties-of-Dot-Product are comparing it to the libraries listed below
Sorting:
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆86Updated 2 years ago
- Crawl & visualize ICLR papers and reviews.☆18Updated 2 years ago
- This package implements THOR: Transformer with Stochastic Experts.☆65Updated 3 years ago
- Source code for our AAAI'22 paper 《From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression》☆24Updated 3 years ago
- 基于Gated Attention Unit的Transformer模型(尝鲜版)☆98Updated 2 years ago
- Code for EMNLP 2021 main conference paper "Dynamic Knowledge Distillation for Pre-trained Language Models"☆41Updated 2 years ago
- [Findings of ACL 2023] Communication Efficient Federated Learning for Multilingual Machine Translation with Adapter☆12Updated last year
- ☆14Updated last year
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆52Updated 2 years ago
- Official implementation of Privacy Implications of Retrieval-Based Language Models (EMNLP 2023). https://arxiv.org/abs/2305.14888☆35Updated last year
- This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).☆107Updated 3 years ago
- This pytorch package implements PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance (ICML 2022).☆46Updated 2 years ago
- Mixture of Attention Heads☆47Updated 2 years ago
- Ladder Side-Tuning在CLUE上的简单尝试☆21Updated 3 years ago
- Code for CascadeBERT, Findings of EMNLP 2021☆12Updated 3 years ago
- Source code for our EMNLP'21 paper 《Raise a Child in Large Language Model: Towards Effective and Generalizable Fine-tuning》☆60Updated 3 years ago
- [ACL 2023] Code for paper “Tailoring Instructions to Student’s Learning Levels Boosts Knowledge Distillation”(https://arxiv.org/abs/2305.…☆38Updated 2 years ago
- ☆32Updated 3 years ago
- code for promptCSE, emnlp 2022☆11Updated 2 years ago
- Source code for the TMLR paper "Black-Box Prompt Learning for Pre-trained Language Models"☆55Updated last year
- FlatNCE: A Novel Contrastive Representation Learning Objective☆90Updated 3 years ago
- ☆46Updated last month
- ☆56Updated 2 years ago
- 😎 A simple and easy-to-use toolkit for GPU scheduling.☆43Updated last month
- Must-read papers on improving efficiency for pre-trained language models.☆104Updated 2 years ago
- [ICML 2023] "Data Efficient Neural Scaling Law via Model Reusing" by Peihao Wang, Rameswar Panda, Zhangyang Wang☆14Updated last year
- Code for EMNLP 2022 paper “Distilled Dual-Encoder Model for Vision-Language Understanding”☆30Updated 2 years ago
- [NeurIPS 2024] Code and Data Repo for Paper "Embedding Trajectory for Out-of-Distribution Detection in Mathematical Reasoning"☆26Updated last year
- [ACL-IJCNLP 2021] "EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets" by Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, …☆18Updated 3 years ago
- Code associated with the paper **SkipBERT: Efficient Inference with Shallow Layer Skipping**, at ACL 2022☆16Updated 3 years ago