GeWu-Lab / Patch-MattersLinks
[CVPR2025] Code Release of Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
☆17Updated 5 months ago
Alternatives and similar repositories for Patch-Matters
Users that are interested in Patch-Matters are comparing it to the libraries listed below
Sorting:
- A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of…☆162Updated last week
- [ICCV 2025] Official PyTorch Code for "Advancing Textual Prompt Learning with Anchored Attributes"☆106Updated this week
- Easy wrapper for inserting LoRA layers in CLIP.☆40Updated last year
- Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval [CVPR 2025 Highlight]☆60Updated 4 months ago
- [ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.☆55Updated last week
- A list of referring video object segmentation papers☆55Updated 5 months ago
- [NeurIPS 2023] The official implementation of SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation☆33Updated last year
- The official implementation of A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation☆23Updated 3 months ago
- Code for paper "LLMs Can Evolve Continually on Modality for X-Modal Reasoning" NeurIPS2024☆40Updated 11 months ago
- Composed Video Retrieval☆61Updated last year
- [ToMM2023] - AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval☆20Updated last year
- Official pytorch repository for "TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection" (AAAI 2024 Pape…☆52Updated 9 months ago
- Official implement of ICML2024 Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation☆54Updated last year
- code for FineLIP☆35Updated 2 months ago
- [ICCV-2023] The official code of Bridging Vision and Language Encoders: Parameter-Efficient Tuning for Referring Image Segmentation☆138Updated 4 months ago
- The official code for "TextRefiner: Internal Visual Feature as Efficient Refiner for Vision-Language Models Prompt Tuning" | [AAAI2025]☆46Updated 8 months ago
- cliptrase☆47Updated last year
- ☆73Updated 11 months ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆47Updated last month
- [NeurIPS 2024] A Large-Scale Human-Centric Benchmark for Referring Expression Comprehension in the LMM Era☆11Updated last year
- 【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval☆90Updated last year
- Codes of the Fine-grained Textual Inversion network for Zero-Shot Composed Image Retrieval☆25Updated 7 months ago
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆67Updated last year
- Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models (AAAI 2024)☆73Updated 9 months ago
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆48Updated 7 months ago
- [CVPR 2024 Accepted] TaskWeave: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection☆27Updated last year
- The official PyTorch implementation of the CVPR 2023 paper "Contrastive Grouping with Transformer for Referring Image Segmentation".☆51Updated last year
- 【AAAI2025】DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification☆66Updated 8 months ago
- [AAAI'25, CVPRW 2024] Official repository of paper titled "Learning to Prompt with Text Only Supervision for Vision-Language Models".☆114Updated 11 months ago
- ☆59Updated last year