CongpeiQiu / CLIPRefinerLinks
[ICLR2025] Code Release of Refining CLlP's Spatial Awareness: A Visual-centric Perspective
☆19Updated 5 months ago
Alternatives and similar repositories for CLIPRefiner
Users that are interested in CLIPRefiner are comparing it to the libraries listed below
Sorting:
- paper list on Video Moment Retrieval (VMR), or Temporal Video Grounding (TVG), Video Grounding (VG), or Temporal Sentence Grounding in Vi…☆23Updated last month
- A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.☆674Updated 2 weeks ago
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"☆516Updated last month
- A curated publication list on open vocabulary semantic segmentation and related area (e.g. zero-shot semantic segmentation) resources..☆723Updated this week
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆190Updated last year
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga☆121Updated 3 weeks ago
- [AAAI 2025] AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video…☆88Updated 9 months ago
- Collection of Composed Image Retrieval (CIR) papers.☆265Updated last month
- ☆15Updated 5 months ago
- ☆16Updated 8 months ago
- [CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"☆327Updated 3 weeks ago
- Official repository for VisionZip (CVPR 2025)☆351Updated 2 months ago
- Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future☆196Updated 5 months ago
- Code for Scaling Language-Free Visual Representation Learning (WebSSL)☆245Updated 5 months ago
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆153Updated 6 months ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆374Updated 9 months ago
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 5 months ago
- A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of…☆130Updated 2 weeks ago
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆693Updated last month
- Awesome papers & datasets specifically focused on long-term videos.☆314Updated last month
- (TPAMI 2024) A Survey on Open Vocabulary Learning☆952Updated 6 months ago
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆24Updated 3 months ago
- The official implementation of A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation☆23Updated last month
- Universal Video Temporal Grounding with Generative Multi-modal Large Language Models☆27Updated 2 months ago
- [TPAMI 2025] Towards Visual Grounding: A Survey☆235Updated last month
- [CVPR 2025] Official implementation of paper "MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders".☆40Updated 3 months ago
- [ICCV 2025] The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs".☆16Updated 2 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆297Updated this week
- [NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Langu…☆218Updated last week
- [CVPR 2025] Official repository of the paper "Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation"☆109Updated 2 months ago