Say-Hello2y / Transformer-attentionLinks
compare the theory attention gradient with PyTorch attention gradient
β15Updated last year
Alternatives and similar repositories for Transformer-attention
Users that are interested in Transformer-attention are comparing it to the libraries listed below
Sorting:
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)β91Updated 2 years ago
- π Awesome papers on token redundancy reductionβ11Updated 10 months ago
- [ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformersβ105Updated last year
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioningβ135Updated last year
- Official code of paper Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RLβ24Updated 2 years ago
- [ICML 2025 Oral] The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchmaβ¦β68Updated 6 months ago
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.β419Updated 7 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Modelsβ153Updated 7 months ago
- β16Updated last year
- Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"β46Updated 11 months ago
- β120Updated last year
- Official repo for PAC-Bayes Information Bottleneck. ICLR 2022.β49Updated 3 years ago
- CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for task-aware parameter-efficient fine-tuning(NeurIPS 2024)β53Updated last year
- [Findings of EMNLP 2024] AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Modelsβ20Updated last year
- [ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging techβ¦β105Updated 2 years ago
- The official implementation of "2024NeurIPS Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation"β52Updated last year
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)β20Updated last year
- This repository contains the implementation of the paper "MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models".β24Updated 8 months ago
- [ICMLβ24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".β123Updated 7 months ago
- [NeurIPS'24] Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".β38Updated last year
- ICLR 2024, Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matchingβ105Updated last year
- β125Updated last year
- [ICLR 2026] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMsβ41Updated 8 months ago
- code for paper Sparse Structure Search for Delta Tuningβ11Updated 3 years ago
- (CVPR 2022) Automated Progressive Learning for Efficient Training of Vision Transformersβ25Updated 11 months ago
- [ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!β39Updated last year
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Keyβ104Updated last month
- A curated list of awesome papers on dataset reduction, including dataset distillation (dataset condensation) and dataset pruning (coresetβ¦β60Updated last year
- Implementation of "Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models" [NeurIPS 2025]β73Updated last month
- [NeurIPS 2022] Latency-aware Spatial-wise Dynamic Networksβ25Updated 2 years ago