Say-Hello2y / Transformer-attentionLinks
compare the theory attention gradient with PyTorch attention gradient
☆15Updated last year
Alternatives and similar repositories for Transformer-attention
Users that are interested in Transformer-attention are comparing it to the libraries listed below
Sorting:
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆90Updated 2 years ago
- [AI4MATH@ICML2025] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆38Updated 3 months ago
- [ICML 2024] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆111Updated 2 months ago
- [ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.☆105Updated 8 months ago
- [NeurIPS 2022] Latency-aware Spatial-wise Dynamic Networks☆24Updated 2 years ago
- [ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!☆38Updated last year
- Official PyTorch implementation of the paper "dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching" (dLLM-Cache…☆147Updated last month
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models☆126Updated 2 months ago
- LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning☆35Updated last year
- ICLR 2024, Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching☆101Updated last year
- CorDA: Context-Oriented Decomposition Adaptation of Large Language Models for task-aware parameter-efficient fine-tuning(NeurIPS 2024)☆49Updated 7 months ago
- Code for "ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language Models" (ICLR 2024)☆20Updated last year
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆87Updated 9 months ago
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆75Updated 3 months ago
- Dataset pruning for ImageNet and LAION-2B.☆79Updated last year
- ☆115Updated last year
- [CVPR2024 highlight] Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching (G-VBSM)☆29Updated 11 months ago
- This repository contains the implementation of the paper "MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models".☆21Updated 3 months ago
- [NeurIPS 2022] Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach -- Official Implementation☆45Updated 2 years ago
- [ICCV 23]An approach to enhance the efficiency of Vision Transformer (ViT) by concurrently employing token pruning and token merging tech…☆99Updated 2 years ago
- Official github repo for SafeDialBench, a comprehensive multi-turn dialogue benchmark to evaluate LLMs' safety.☆34Updated 3 months ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆124Updated 11 months ago
- Paper List of Inference/Test Time Scaling/Computing☆301Updated last week
- The official implementation of "2024NeurIPS Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation"☆47Updated 8 months ago
- code for paper Sparse Structure Search for Delta Tuning☆11Updated 2 years ago
- ☆130Updated last year
- Preprint: Asymmetry in Low-Rank Adapters of Foundation Models☆36Updated last year
- This is the source code for Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy (ICLR20…☆41Updated last year
- [NeurIPS'24] Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".☆37Updated 10 months ago
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆147Updated 6 months ago