gimpong / AAAI25-S5VHLinks
The code for the paper "Efficient Self-Supervised Video Hashing with Selective State Spaces" (AAAI'25).
☆19Updated last month
Alternatives and similar repositories for AAAI25-S5VH
Users that are interested in AAAI25-S5VH are comparing it to the libraries listed below
Sorting:
- ☆43Updated 9 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆46Updated 2 months ago
- The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …☆57Updated 9 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆42Updated 9 months ago
- Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)☆40Updated last week
- Official repository of "Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach" (ACL 2024 Oral)☆32Updated 5 months ago
- [NeurIPS'24] I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing☆25Updated last month
- CLIP-MoE: Mixture of Experts for CLIP☆45Updated 10 months ago
- A curated list of zero-shot captioning papers☆23Updated 2 years ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆38Updated 2 months ago
- [TMM] MINT-IQA: Quality Assessment for AI Generated Images with Instruction Tuning☆18Updated 3 weeks ago
- [WIP@Oct 13] 质衡-基准测试 (Q-Bench in Chinese),包含中文版【底层视觉问答】和【底层视觉描述】数据集,以及中文提示下的图片质量评价。 We will release Q-Bench in more languages in the futu…☆22Updated last year
- Official Repository of Personalized Visual Instruct Tuning☆32Updated 5 months ago
- [CVPR 2025] RAP: Retrieval-Augmented Personalization☆68Updated last month
- [ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models☆32Updated 10 months ago
- Official repo for ColorBench☆20Updated last month
- Adapting LLaMA Decoder to Vision Transformer☆30Updated last year
- [CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"☆45Updated 2 months ago
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆38Updated 6 months ago
- (CVPR 2024) "Unsegment Anything by Simulating Deformation"☆28Updated last year
- [IEEE TCSVT'24] Study of Subjective and Objective Naturalness Assessment of AI-Generated Images☆36Updated 2 months ago
- Code for CVPR 2023 paper "SViTT: Temporal Learning of Sparse Video-Text Transformers"☆20Updated 2 years ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆95Updated last month
- Official codes for "Q-Ground: Image Quality Grounding with Large Multi-modality Models", ACM MM2024 (Oral)☆42Updated 10 months ago
- [CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation☆64Updated last month
- [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models☆17Updated last year
- 🔥Official PyTorch implementation for "LM4LV: A Frozen Large Language Model for Low-level Vision Tasks".☆52Updated last year
- [CVPR 2024] VkD : Improving Knowledge Distillation using Orthogonal Projections☆55Updated 10 months ago
- [NIPS2023]Implementation of Foundation Model is Efficient Multimodal Multitask Model Selector☆37Updated last year
- [CVPR 2025] CoDe: Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient☆105Updated 5 months ago