CongpeiQiu / CLIPRefinerLinks
[ICLR2025] Code Release of Refining CLlP's Spatial Awareness: A Visual-centric Perspective
☆19Updated 6 months ago
Alternatives and similar repositories for CLIPRefiner
Users that are interested in CLIPRefiner are comparing it to the libraries listed below
Sorting:
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"☆527Updated 2 months ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆192Updated last year
- A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.☆683Updated last month
- paper list on Video Moment Retrieval (VMR), or Temporal Video Grounding (TVG), Video Grounding (VG), or Temporal Sentence Grounding in Vi…☆22Updated 2 months ago
- [AAAI 2025] AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video…☆88Updated 9 months ago
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga☆123Updated last week
- ☆16Updated 5 months ago
- Code for Scaling Language-Free Visual Representation Learning (WebSSL)☆245Updated 5 months ago
- A curated publication list on open vocabulary semantic segmentation and related area (e.g. zero-shot semantic segmentation) resources..☆737Updated last week
- [CVPR 2025] PyTorch implementation of T-CORE, introduced in "When the Future Becomes the Past: Taming Temporal Correspondence for Self-su…☆16Updated 6 months ago
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆155Updated 7 months ago
- The official implementation of our paper ''IteRPrimE: Zero-shot Referring Image Segmentation with Iterative Grad-CAM Refinement and Prima…☆14Updated 6 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆313Updated last week
- Collections of Papers and Projects for Multimodal Reasoning.☆105Updated 5 months ago
- [IJCV] Progressive Visual Prompt Learning with Contrastive Feature Re-formation☆14Updated last year
- Official repository for VisionZip (CVPR 2025)☆358Updated 2 months ago
- ☆16Updated 8 months ago
- Universal Video Temporal Grounding with Generative Multi-modal Large Language Models☆29Updated 3 months ago
- A curated list of papers and resources related to Described Object Detection, Open-Vocabulary/Open-World Object Detection and Referring E…☆316Updated 2 months ago
- Awesome papers & datasets specifically focused on long-term videos.☆319Updated last week
- ☆354Updated last year
- 📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.☆717Updated last week
- [TPAMI 2025] Towards Visual Grounding: A Survey☆241Updated 2 months ago
- [CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models☆149Updated last year
- Awesome OVD-OVS - A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future☆198Updated 6 months ago
- Survey: https://arxiv.org/pdf/2507.20198☆172Updated this week
- (TPAMI 2024) A Survey on Open Vocabulary Learning☆954Updated 6 months ago
- Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play?☆19Updated this week
- ☆75Updated 2 weeks ago
- [ICCV2023] CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection☆17Updated 5 months ago