🚀 First survey on Attention Sink in Transformers — 180+ papers on utilization, interpretation, and mitigation.
☆49Apr 16, 2026Updated this week
Alternatives and similar repositories for Awesome-Attention-Sink
Users that are interested in Awesome-Attention-Sink are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- personal settings for linux tools, including zsh, vim, tmux, pip.☆11Dec 2, 2019Updated 6 years ago
- Stanford CoreNLP annotator implementing jMWE for detecting Multi-Word Expressions / collocations☆15Jan 6, 2017Updated 9 years ago
- Code for the paper "Multitasking Framework for Unsupervised Simple Definition Generation" on ACL 2022.☆17Aug 17, 2022Updated 3 years ago
- ☆21Mar 17, 2025Updated last year
- ☆12Feb 11, 2026Updated 2 months ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving☆61Feb 28, 2026Updated last month
- ☆75Mar 26, 2026Updated 3 weeks ago
- [ECAI 2023 Oral] Official Implementation of High Dynamic Range Image Reconstruction via Deep Explicit Polynomial Curve Estimation☆20Nov 3, 2024Updated last year
- Code to reproduce the paper "Do causal predictors generalize better to new domains?"☆15Feb 7, 2025Updated last year
- ☆55May 22, 2025Updated 10 months ago
- Codebase for character-centric story understanding☆14Jan 20, 2022Updated 4 years ago
- SummScreen: A Dataset for Abstractive Screenplay Summarization (ACL 2022)☆41May 22, 2022Updated 3 years ago
- ☆12Jan 1, 2024Updated 2 years ago
- ☆13Jun 25, 2025Updated 9 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆11May 1, 2022Updated 3 years ago
- ☆14Jan 3, 2025Updated last year
- Source code for our paper: "LoGU: Long-form Generation with Uncertainty Expressions".☆17May 27, 2025Updated 10 months ago
- An implementation of torchngp + semantic-nerf☆13Sep 10, 2023Updated 2 years ago
- The official implementation of the paper "Rethinking Pruning for Vision-Language Models: Strategies for Effective Sparsity".☆16Jul 2, 2024Updated last year
- [CIKM-21] Pytorch implementation of LiteGT: Efficient and Lightweight Graph Transformers☆12Nov 16, 2021Updated 4 years ago
- Utilities to parse type information and JSDoc annotations from TypeScript source files, and render Markdown documentation☆12Jun 24, 2023Updated 2 years ago
- Official Code for NAACL 2022 paper: "Persona-Guided Planning for Controlling the Protagonist's Persona in Story Generation"☆16Sep 1, 2022Updated 3 years ago
- Self-Distribution BNN☆10Mar 8, 2022Updated 4 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆19Jan 5, 2023Updated 3 years ago
- A python wrapper for Stanford CoreNLP, simple and customizable.☆13Oct 26, 2021Updated 4 years ago
- Codes and data for EMNLP 2021 paper "Self- and Pseudo-self-supervised Prediction of Speaker and Key-utterance for Multi-party Dialogue Re…☆16Oct 15, 2022Updated 3 years ago
- Fastened CROWN: Tightened Neural Network Robustness Certificates☆10Feb 10, 2020Updated 6 years ago
- ☆13Sep 28, 2022Updated 3 years ago
- Code for Dissecting Generation Modes for Abstractive Summarization Models via Ablation and Attribution (ACL2021)☆13Jun 2, 2021Updated 4 years ago
- ☆21Feb 10, 2025Updated last year
- Source code of ACL 2023 Main Conference Paper "PAD-Net: An Efficient Framework for Dynamic Networks".☆12Feb 28, 2026Updated last month
- Paper list for the paper "Authorship Attribution in the Era of Large Language Models: Problems, Methodologies, and Challenges (SIGKDD Exp…☆19Apr 5, 2026Updated last week
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆15Mar 17, 2021Updated 5 years ago
- [ICLR'25] ARB-LLM: Alternating Refined Binarizations for Large Language Models☆28Aug 5, 2025Updated 8 months ago
- Turn GitHub into an RSS reader☆25Jan 1, 2024Updated 2 years ago
- Paper submission☆21Aug 7, 2023Updated 2 years ago
- 基于rasa_框架实现指自然语言相关功能:实体识别、文本分类、代消解功能、关系抽取等☆17May 22, 2023Updated 2 years ago
- ☆11Mar 19, 2023Updated 3 years ago
- 🌼🌼🌼 Summary on the learn of Generative Adversarial Network。 GAN的一些练手demo,包括自编码器、变分编码器、DCGAN、cycleGAN等各种对抗生成网络模型。仅供参考学习。参考《pytorch深度学习入…☆14Dec 24, 2019Updated 6 years ago