Say-Hello2y / Transformer-attentionLinks
compare the theory attention gradient with PyTorch attention gradient
☆14Updated last year
Alternatives and similar repositories for Transformer-attention
Users that are interested in Transformer-attention are comparing it to the libraries listed below
Sorting:
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆90Updated 2 years ago
- code for paper Sparse Structure Search for Delta Tuning☆11Updated 2 years ago
- ICLR 2024, Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching☆101Updated last year
- [arXiv] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆32Updated last month
- [ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.☆103Updated 6 months ago
- [ICML 2024] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆105Updated last week
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆116Updated last week
- Paper List of Inference/Test Time Scaling/Computing☆280Updated 2 weeks ago
- Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"☆60Updated 2 months ago
- [NeurIPS'22] This is an official implementation for "Scaling & Shifting Your Features: A New Baseline for Efficient Model Tuning".☆183Updated last year
- [ICCV2023] Dataset Quantization☆259Updated last year
- Jittor implementation of Vision Transformer with Deformable Attention☆31Updated 3 years ago
- [CVPR2024 highlight] Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching (G-VBSM)☆28Updated 9 months ago
- [NeurIPS 2022] Latency-aware Spatial-wise Dynamic Networks☆24Updated last year
- [NeurIPS 2022] Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach -- Official Implementation☆45Updated 2 years ago
- ☆24Updated last year
- [ICML 2025 Oral] The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchma…☆59Updated this week
- The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.☆251Updated last week
- ☆118Updated last year
- PyTorch codes for "LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning"☆237Updated 2 years ago
- ☆37Updated 3 years ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆117Updated 10 months ago
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…☆126Updated last week
- Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"☆37Updated 4 months ago
- Official repo for PAC-Bayes Information Bottleneck. ICLR 2022.☆49Updated 3 years ago
- Official code of paper Understanding, Predicting and Better Resolving Q-Value Divergence in Offline-RL☆23Updated last year
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆19Updated last year
- Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent w…☆39Updated 3 weeks ago
- [ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging tech…☆99Updated 2 years ago
- ☆108Updated last year