Say-Hello2y / Transformer-attentionLinks
compare the theory attention gradient with PyTorch attention gradient
☆15Updated last year
Alternatives and similar repositories for Transformer-attention
Users that are interested in Transformer-attention are comparing it to the libraries listed below
Sorting:
- Official repo for PAC-Bayes Information Bottleneck. ICLR 2022.☆49Updated 3 years ago
- Code release for Deep Incubation (https://arxiv.org/abs/2212.04129)☆90Updated 2 years ago
- [ACL'24, Outstanding Paper] Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!☆38Updated last year
- [CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key☆69Updated 2 months ago
- ICLR 2024, Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching☆102Updated last year
- [NeurIPS'24] Official implementation of paper "Unveiling the Tapestry of Consistency in Large Vision-Language Models".☆36Updated 9 months ago
- VLM Evaluation: Benchmark for VLMs, spanning text generation tasks from VQA to Captioning☆120Updated 10 months ago
- Large Language Diffusion with Ordered Unmasking☆44Updated 2 weeks ago
- [ICML 2024] Official code for the paper "Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark ".☆109Updated last month
- Preview code of ECCV'24 paper "Distill Gold from Massive Ores" (BiLP)☆24Updated last year
- [CVPR2024 highlight] Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching (G-VBSM)☆28Updated 10 months ago
- Code for NeurIPS 2024 paper "Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs"☆38Updated 5 months ago
- [NeurIPS 2022] Latency-aware Spatial-wise Dynamic Networks☆24Updated last year
- Reference implementation for Token-level Direct Preference Optimization(TDPO)☆146Updated 5 months ago
- ☆29Updated 2 years ago
- official code for paper Probing the Decision Boundaries of In-context Learning in Large Language Models. https://arxiv.org/abs/2406.11233…☆19Updated 2 weeks ago
- [NeurIPS 2022] Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach -- Official Implementation☆45Updated 2 years ago
- Paper List of Inference/Test Time Scaling/Computing☆289Updated last month
- This is the source code for Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy (ICLR20…☆41Updated 11 months ago
- 中科院自动化所博士中期考核 LaTeX 模板☆9Updated 4 years ago
- Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts☆25Updated last year
- [ACL'24] Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization☆85Updated 11 months ago
- [ICML 2023] UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers.☆105Updated 7 months ago
- [arXiv] Do Not Let Low-Probability Tokens Over-Dominate in RL for LLMs☆36Updated 2 months ago
- [ICML 2025 Oral] The official repository for the paper "Can MLLMs Reason in Multimodality? EMMA: An Enhanced MultiModal ReAsoning Benchma…☆61Updated 3 weeks ago
- ☆131Updated last year
- An Easy-to-use, Scalable and High-performance RLHF Framework designed for Multimodal Models.☆139Updated 4 months ago
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆86Updated 8 months ago
- [ACM Multimedia 2025] This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual…☆81Updated 5 months ago
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆239Updated 2 months ago