CongpeiQiu / CLIPRefinerLinks
[ICLR2025] Code Release of Refining CLlP's Spatial Awareness: A Visual-centric Perspective
☆20Updated 8 months ago
Alternatives and similar repositories for CLIPRefiner
Users that are interested in CLIPRefiner are comparing it to the libraries listed below
Sorting:
- ☆13Updated 8 months ago
- paper list on Video Moment Retrieval (VMR), or Temporal Video Grounding (TVG), Video Grounding (VG), or Temporal Sentence Grounding in Vi…☆31Updated last week
- ☆16Updated 11 months ago
- Project Page For "Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement"☆580Updated 4 months ago
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga☆135Updated 2 months ago
- A curated list of awesome prompt/adapter learning methods for vision-language models like CLIP.☆734Updated 3 weeks ago
- [CVPR 2025 Highlight] Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding☆53Updated 3 months ago
- [NeurIPS 2025] Deep Memory Backtracking for Long Video Understanding☆61Updated 2 months ago
- [AAAI 2025] AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video…☆91Updated last year
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆199Updated last year
- [NeurIPS 2025] The official PyTorch implementation of the "Vision Function Layer in MLLM".☆21Updated last week
- ☆17Updated 8 months ago
- code for the paper "CoReS: Orchestrating the Dance of Reasoning and Segmentation"☆20Updated last month
- Universal Video Temporal Grounding with Generative Multi-modal Large Language Models☆42Updated last month
- Survey: https://arxiv.org/pdf/2507.20198☆257Updated this week
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆339Updated 2 months ago
- Official repository for VisionZip (CVPR 2025)☆392Updated 5 months ago
- [NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning☆252Updated 2 months ago
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆153Updated 9 months ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆414Updated last year
- Official PyTorch Code of ReKV (ICLR'25)☆78Updated last month
- [NeurIPS2025 Spotlight 🔥 ] Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Langu…☆261Updated last month
- Collection of Composed Image Retrieval (CIR) papers.☆289Updated last month
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆150Updated last week
- [CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"☆346Updated last week
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆101Updated 2 weeks ago
- Awesome papers & datasets specifically focused on long-term videos.☆335Updated 2 months ago
- [CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Models☆155Updated last year
- [CVPR 2025] Official implementation of paper "MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders".☆47Updated 6 months ago
- [CVPR'2024 Highlight] Official PyTorch implementation of the paper "VTimeLLM: Empower LLM to Grasp Video Moments".☆293Updated last year