Say-Hello2y / Transformer-attentionLinks
compare the theory attention gradient with PyTorch attention gradient
☆14Updated last year
Alternatives and similar repositories for Transformer-attention
Users that are interested in Transformer-attention are comparing it to the libraries listed below
Sorting:
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆90Updated 2 years ago
- Official code of paper Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL☆23Updated last year
- [NeurIPS 2022] Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach -- Official Implementation☆44Updated last year
- [ICML 2024] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆104Updated 11 months ago
- [ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging tech…☆96Updated last year
- This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…☆78Updated 3 months ago
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆36Updated last year
- [arXiv] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆22Updated 2 weeks ago
- ☆38Updated last year
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆112Updated 8 months ago
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆33Updated last year
- [CVPR2024 highlight] Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching (G-VBSM)☆28Updated 7 months ago
- Collection of papers and resources for data augmentation (DA) in visual reinforcement learning (RL).☆77Updated last year
- [EVA ICLR'23; LARA ICML'22] Efficient attention mechanisms via control variates, random features, and importance sampling☆85Updated 2 years ago
- Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"☆34Updated 3 months ago
- (CVPR 2022) Automated Progressive Learning for Efficient Training of Vision Transformers☆25Updated 3 months ago
- (NeurIPS 2023 spotlight) Large-scale Dataset Distillation/Condensation, 50 IPC (Images Per Class) achieves the highest 60.8% on original …☆128Updated 6 months ago
- Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts☆24Updated last year
- ☆15Updated 9 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆97Updated 4 months ago
- [ICLR'24] "DeepZero: Scaling up Zeroth-Order Optimization for Deep Model Training" by Aochuan Chen*, Yimeng Zhang*, Jinghan Jia, James Di…☆57Updated 7 months ago
- [NeurIPS 2022] Latency-aware Spatial-wise Dynamic Networks☆24Updated last year
- Repository of "Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online Reinforcement Learning" (NeurIPS 2023 Spotlight)☆37Updated last year
- Representation Surgery for Multi-Task Model Merging. ICML, 2024.☆45Updated 7 months ago
- ☆117Updated last year
- ☆83Updated last month
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Model☆31Updated 4 months ago
- ☆129Updated 10 months ago
- ICLR 2024, Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching☆101Updated last year
- ☆24Updated last year