Say-Hello2y / Transformer-attentionLinks
compare the theory attention gradient with PyTorch attention gradient
☆15Updated last year
Alternatives and similar repositories for Transformer-attention
Users that are interested in Transformer-attention are comparing it to the libraries listed below
Sorting:
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆90Updated 2 years ago
- [ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers☆105Updated 10 months ago
- [ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆115Updated 4 months ago
- [ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!☆38Updated last year
- Official repo for PAC-Bayes Information Bottleneck. ICLR 2022.☆50Updated 3 years ago
- ICLR 2024, Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching☆102Updated last year
- [NeurIPS 2022] Latency-aware Spatial-wise Dynamic Networks☆24Updated 2 years ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆129Updated last year
- [ECCV 2022] Learning to Weight Samples for Dynamic Early-exiting Networks☆36Updated 2 years ago
- [NeurIPS 2022] Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach -- Official Implementation☆45Updated 2 years ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆144Updated 4 months ago
- [ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging tech…☆101Updated 2 years ago
- ☆37Updated 3 years ago
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆20Updated last year
- Official code of paper Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL☆23Updated 2 years ago
- Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"☆42Updated 9 months ago
- [Neurips 2022] “ Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropogation”, Ziyu Jiang*, Xuxi Chen*, Xueqin Huan…☆20Updated 2 years ago
- This repository contains the implementation of the paper "MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models".☆24Updated 5 months ago
- This is the source code for Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy (ICLR20…☆43Updated last year
- ☆123Updated last year
- (CVPR 2022) Automated Progressive Learning for Efficient Training of Vision Transformers☆25Updated 8 months ago
- Representation Surgery for Multi-Task Model Merging. ICML, 2024.☆46Updated last year
- Official code for our CVPR'22 paper “Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space”☆250Updated 2 months ago
- Prioritize Alignment in Dataset Distillation☆20Updated 11 months ago
- (NeurIPS 2023 spotlight) Large-scale Dataset Distillation/Condensation, 50 IPC (Images Per Class) achieves the highest 60.8% on original …☆131Updated last year
- [ICML 2025 Oral] The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchma…☆69Updated 4 months ago
- [NeurIPS'22] This is an official implementation for "Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning".☆187Updated 2 years ago
- [CVPR2024 highlight] Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching (G-VBSM)☆28Updated last year
- Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"☆23Updated 8 months ago
- [ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…☆82Updated 8 months ago