Say-Hello2y / Transformer-attentionLinks
compare the theory attention gradient with PyTorch attention gradient
☆15Updated last year
Alternatives and similar repositories for Transformer-attention
Users that are interested in Transformer-attention are comparing it to the libraries listed below
Sorting:
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆91Updated 2 years ago
- [ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆120Updated 5 months ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆132Updated last year
- ICLR 2024, Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching☆105Updated last year
- 😎 Awesome papers on token redundancy reduction☆11Updated 9 months ago
- [ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers☆105Updated 11 months ago
- This is the source code for Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy (ICLR20…☆44Updated last year
- Official repo for PAC-Bayes Information Bottleneck. ICLR 2022.☆49Updated 3 years ago
- [TPAMI] Searching prompt modules for parameter-efficient transfer learning.☆238Updated 2 years ago
- [CVPR2024 highlight] Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching (G-VBSM)☆28Updated last year
- ☆30Updated 2 years ago
- ☆120Updated last year
- Dataset pruning for ImageNet and LAION-2B.☆79Updated last year
- code for paper Sparse Structure Search for Delta Tuning☆11Updated 3 years ago
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆20Updated last year
- CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for task-aware parameter-efficient fine-tuning(NeurIPS 2024)☆53Updated 11 months ago
- Hierarchical Decomposition of Prompt-Based Continual Learning: Rethinking Obscured Sub-optimality (NeurIPS 2023, Spotlight)☆90Updated last year
- [NeurIPS'24] Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".☆38Updated last year
- A curated list of awesome papers on dataset reduction, including dataset distillation (dataset condensation) and dataset pruning (coreset…☆59Updated 11 months ago
- [NeurIPS 2022] Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach -- Official Implementation☆46Updated 2 years ago
- The official implementation of paper "Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning" (CVPR …☆22Updated last year
- [NeurIPS'22] This is an official implementation for "Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning".☆190Updated 2 years ago
- Official code for our paper "Model Composition for Multimodal Large Language Models" (ACL 2024)☆31Updated 11 months ago
- [ICLR2025 Spotlight] Advantage-Guided Distillation for Preference Alignment in Small Language Models☆24Updated 10 months ago
- [ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!☆38Updated last year
- (NeurIPS 2023 spotlight) Large-scale Dataset Distillation/Condensation, 50 IPC (Images Per Class) achieves the highest 60.8% on original …☆134Updated last year
- Official PyTorch implementation of "Dataset Condensation via Efficient Synthetic-Data Parameterization" (ICML'22)☆115Updated 2 years ago
- The Continual Learning in Multimodality Benchmark☆68Updated 2 years ago
- [ICML 2025 Oral] The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchma…☆69Updated 5 months ago
- [TMM 2025] This is the official Pytorch code for our paper "Visual Position Prompt for MLLM based Visual Grounding".☆27Updated 5 months ago