Say-Hello2y / Transformer-attentionLinks
compare the theory attention gradient with PyTorch attention gradient
☆15Updated last year
Alternatives and similar repositories for Transformer-attention
Users that are interested in Transformer-attention are comparing it to the libraries listed below
Sorting:
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆91Updated 2 years ago
- code for paper Sparse Structure Search for Delta Tuning☆11Updated 3 years ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆135Updated last year
- [ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers☆105Updated last year
- ICLR 2024, Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching☆105Updated last year
- [ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆123Updated 7 months ago
- [ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!☆39Updated last year
- This is the source code for Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy (ICLR20…☆45Updated last year
- This repository contains the implementation of the paper "MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models".☆24Updated 8 months ago
- Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"☆46Updated 11 months ago
- Paper List of Inference/Test Time Scaling/Computing☆344Updated 5 months ago
- Code for paper "Merging Multi-Task Models via Weight-Ensembling Mixture of Experts"☆30Updated last year
- 😎 Awesome papers on token redundancy reduction☆11Updated 11 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆153Updated 7 months ago
- Official repo for PAC-Bayes Information Bottleneck. ICLR 2022.☆49Updated 3 years ago
- Implementation of "Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models" [NeurIPS 2025]☆73Updated last month
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆36Updated last year
- Code accompanying the paper "Massive Activations in Large Language Models"☆195Updated last year
- [ICLR 2026] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆41Updated 8 months ago
- ☆31Updated 2 years ago
- [NeurIPS 2022] Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach -- Official Implementation☆48Updated 2 years ago
- [ICCV2023] Dataset Quantization☆263Updated 2 years ago
- [Findings of EMNLP 2024] AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models☆20Updated last year
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆20Updated last year
- [ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging tech…☆105Updated 2 years ago
- [NeurIPS 2022] Latency-aware Spatial-wise Dynamic Networks☆25Updated 2 years ago
- CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for task-aware parameter-efficient fine-tuning(NeurIPS 2024)☆53Updated last year
- A curated list of awesome papers on dataset reduction, including dataset distillation (dataset condensation) and dataset pruning (coreset…☆60Updated last year
- [NeurIPS'22] This is an official implementation for "Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning".☆193Updated 2 years ago
- [CVPR2024 highlight] Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching (G-VBSM)☆28Updated last year