Say-Hello2y / Transformer-attentionLinks
compare the theory attention gradient with PyTorch attention gradient
☆15Updated last year
Alternatives and similar repositories for Transformer-attention
Users that are interested in Transformer-attention are comparing it to the libraries listed below
Sorting:
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆91Updated 2 years ago
- [ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers☆105Updated last year
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆135Updated last year
- (CVPR 2022) Automated Progressive Learning for Efficient Training of Vision Transformers☆25Updated 11 months ago
- ICLR 2024, Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching☆105Updated last year
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆20Updated last year
- [CVPR2024 highlight] Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching (G-VBSM)☆28Updated last year
- [ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging tech…☆104Updated 2 years ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆152Updated 6 months ago
- Official repo for PAC-Bayes Information Bottleneck. ICLR 2022.☆49Updated 3 years ago
- [NeurIPS'22] This is an official implementation for "Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning".☆192Updated 2 years ago
- [ECCV 2022] Learning to Weight Samples for Dynamic Early-exiting Networks☆37Updated 2 years ago
- [ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆124Updated 6 months ago
- 😎 Awesome papers on token redundancy reduction☆11Updated 10 months ago
- CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for task-aware parameter-efficient fine-tuning(NeurIPS 2024)☆53Updated last year
- ☆120Updated last year
- [NeurIPS 2022] Latency-aware Spatial-wise Dynamic Networks☆25Updated 2 years ago
- [NeurIPS 2022] Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach -- Official Implementation☆47Updated 2 years ago
- code for paper Sparse Structure Search for Delta Tuning☆11Updated 3 years ago
- PyTorch codes for "LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning"☆242Updated 3 years ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆414Updated 6 months ago
- [IEEE TPAMI] Latency-aware Unified Dynamic Networks for Efficient Image Recognition☆53Updated 10 months ago
- [TPAMI] Searching prompt modules for parameter-efficient transfer learning.☆238Updated 2 years ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆37Updated last year
- [Findings of EMNLP 2024] AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models☆20Updated last year
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.☆80Updated 3 months ago
- [ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…☆82Updated 11 months ago
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆162Updated 4 months ago
- Preview code of ECCV'24 paper "Distill Gold from Massive Ores" (BiLP)☆25Updated last year
- ☆93Updated 2 years ago