BIT-DA / ABSLinks
[ICML2025] Official Code of From Local Details to Global Context: Advancing Vision-Language Models with Attention-Based Selection
β22Updated 2 months ago
Alternatives and similar repositories for ABS
Users that are interested in ABS are comparing it to the libraries listed below
Sorting:
- [ICML 2025] This is the official PyTorch implementation of "π΅ HarmoniCa: Harmonizing Training and Inference for Better Feature Caching iβ¦β42Updated last month
- This repository contains the code for our ICML 2025 paperββLENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selectionπβ25Updated 3 months ago
- VeriThinker: Learning to Verify Makes Reasoning Model Efficientβ52Updated last month
- β25Updated 6 months ago
- π Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Modelsβ29Updated 2 months ago
- [ICLR 2025] See What You Are Told: Visual Attention Sink in Large Multimodal Modelsβ45Updated 6 months ago
- [ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMsβ25Updated 2 months ago
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generationβ34Updated 11 months ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Modelβ95Updated last month
- β CLoG: Benchmarking Continual Learning of Image Generation Modelsβ20Updated last year
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-trainingβ80Updated last month
- Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"β60Updated last month
- β54Updated 3 months ago
- β105Updated 5 months ago
- [ICML'25] Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference".β148Updated 2 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)β145Updated 3 weeks ago
- [CVPR 2025] VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsificationβ33Updated 5 months ago
- [ECCV 2024] Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentationβ32Updated 5 months ago
- [ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Modelsβ101Updated 10 months ago
- [NeurIPS 2024] ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesisβ24Updated 9 months ago
- Multi-Stage Vision Token Dropping: Towards Efficient Multimodal Large Language Modelβ34Updated 7 months ago
- [ECCV 2024] Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediatorsβ45Updated 11 months ago
- Official repository of the paper "A Glimpse to Compress: Dynamic Visual Token Pruning for Large Vision-Language Models"β54Updated this week
- [CVPR 2025] Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentationβ58Updated this week
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Modelsβ46Updated 2 months ago
- Official implementation for the paper"Towards Understanding How Knowledge Evolves in Large Vision-Language Models"β17Updated 4 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ190Updated last week
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reductionβ120Updated 5 months ago
- β30Updated last year
- Autoregressive Semantic Visual Reconstruction Helps VLMs Understand Betterβ37Updated 2 months ago