Say-Hello2y / Transformer-attentionLinks

compare the theory attention gradient with PyTorch attention gradient

☆15

Alternatives and similar repositories for Transformer-attention

Users that are interested in Transformer-attention are comparing it to the libraries listed below

Sorting:

LeapLabTHU / Deep-Incubation
Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)
☆90Updated 2 years ago
sdc17 / UPop
[ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers
☆105Updated 10 months ago
ZO-Bench / ZO-LLM
[ICML‘24] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".
☆115Updated 4 months ago
ZHZisZZ / emulated-disalignment
[ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
☆38Updated last year
RyanWangZf / PAC-Bayes-IB
Official repo for PAC-Bayes Information Bottleneck. ICLR 2022.
☆50Updated 3 years ago
NUS-HPC-AI-Lab / DATM
ICLR 2024, Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching
☆102Updated last year
LeapLabTHU / LASNet
[NeurIPS 2022] Latency-aware Spatial-wise Dynamic Networks
☆24Updated 2 years ago
TRI-ML / vlm-evaluation
VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning
☆129Updated last year
LeapLabTHU / L2W-DEN
[ECCV 2022] Learning to Weight Samples for Dynamic Early-exiting Networks
☆36Updated 2 years ago
Mi-Peng / Sparse-Sharpness-Aware-Minimization
[NeurIPS 2022] Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach -- Official Implementation
☆45Updated 2 years ago
LINs-lab / DynMoE
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆144Updated 4 months ago
OpenGVLab / DiffRate
[ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging tech…
☆101Updated 2 years ago
decemberzhou / TF_TAS
☆37Updated 3 years ago
ylsung / ECoFLaP
Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)
☆20Updated last year
yueyang130 / SEEM
Official code of paper Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL
☆23Updated 2 years ago
YangRui2015 / Generalizable-Reward-Model
Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"
☆42Updated 9 months ago
VITA-Group / BackRazor_Neurips22
[Neurips 2022] “ Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropogation”, Ziyu Jiang*, Xuxi Chen*, Xueqin Huan…
☆20Updated 2 years ago
NJUDeepEngine / meteora
This repository contains the implementation of the paper "MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models".
☆24Updated 5 months ago
ZSHsh98 / MMD-MP
This is the source code for Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy (ICLR20…
☆43Updated last year
wutaiqiang / MoSLoRA
☆123Updated last year
changlin31 / AutoProg
(CVPR 2022) Automated Progressive Learning for Efficient Training of Vision Transformers
☆25Updated 8 months ago
EnnengYang / RepresentationSurgery
Representation Surgery for Multi-Task Model Merging. ICML, 2024.
☆46Updated last year
Arnav0400 / ViT-Slim
Official code for our CVPR'22 paper “Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space”
☆250Updated 2 months ago
NUS-HPC-AI-Lab / PAD
Prioritize Alignment in Dataset Distillation
☆20Updated 11 months ago
VILA-Lab / SRe2L
(NeurIPS 2023 spotlight) Large-scale Dataset Distillation/Condensation, 50 IPC (Images Per Class) achieves the highest 60.8% on original …
☆131Updated last year
EMMA-Bench / EMMA
[ICML 2025 Oral] The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchma…
☆69Updated 4 months ago
dongzelian / SSF
[NeurIPS'22] This is an official implementation for "Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning".
☆187Updated 2 years ago
shaoshitong / G_VBSM_Dataset_Condensation
[CVPR2024 highlight] Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching (G-VBSM)
☆28Updated last year
gszfwsb / AutoGnothi
Official PyTorch code for ICLR 2025 paper "Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models"
☆23Updated 8 months ago
yfzhang114 / LLaVA-Align
[ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…
☆82Updated 8 months ago