(ICCV 2025) ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
☆129Nov 14, 2025Updated 3 months ago
Alternatives and similar repositories for ReferDINO
Users that are interested in ReferDINO are comparing it to the libraries listed below
Sorting:
- (CVPR 2026) Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation☆27Updated this week
- [ICCV2025] Referring any person or objects given a natural language description. Code base for RexSeek and HumanRef Benchmark☆177Oct 15, 2025Updated 4 months ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆19Jul 20, 2024Updated last year
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆66Jun 23, 2025Updated 8 months ago
- [ICCV 2025] MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation☆20Sep 5, 2025Updated 6 months ago
- (NeurIPS 2024) Official repository of paper "Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models"☆35Mar 22, 2025Updated 11 months ago
- Pruned CoTracker architecture for tracking the myocardium in 2D echo images.☆19May 6, 2025Updated 10 months ago
- ☆25Dec 23, 2024Updated last year
- [CVPR-2024] Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation☆85Jul 24, 2024Updated last year
- ☆26Oct 15, 2024Updated last year
- [NeurIPS 2025] PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer☆28Oct 2, 2025Updated 5 months ago
- Official code for CAVIS: Context-Aware Video Instance Segmentation☆97Sep 17, 2025Updated 5 months ago
- [AAAI 2026] Segment Anything Across Shots: A Method and Benchmark☆27Nov 16, 2025Updated 3 months ago
- ☆44Feb 5, 2025Updated last year
- A Curated List of Vision-Language-Action (VLA) Research☆61Updated this week
- This is the official implementation of ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos☆43Nov 5, 2025Updated 4 months ago
- [ICCVW 2025] Find First, Track Next: Decoupling Identification and Propagation in Referring Video Object Segmentation☆80Oct 22, 2025Updated 4 months ago
- Official code of DynOMo: Online Point Tracking by Dynamic Online Monocular Gaussian Reconstruction (3DV 2025))☆173Jan 29, 2025Updated last year
- (CVPR 2026) Official repository of paper "WeDetect: Fast Open-Vocabulary Object Detection as Retrieval"☆137Feb 21, 2026Updated last week
- Official Implementation of VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Jo…☆23Jun 27, 2025Updated 8 months ago
- [ICCV 2025] RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping☆37Nov 21, 2025Updated 3 months ago
- [CVPR 2025] Official implementation of the paper "SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction"☆47Dec 11, 2025Updated 2 months ago
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆19Jul 10, 2025Updated 7 months ago
- [CVPR 2025 Highlight] Official repository for the paper: "SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation"☆367Sep 25, 2025Updated 5 months ago
- [NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion☆100Oct 29, 2025Updated 4 months ago
- Estimate dataset difficulty and detect label mistakes using reconstruction error ratios!☆28Jan 10, 2025Updated last year
- ☆19May 28, 2025Updated 9 months ago
- LiteGPT: A 124M Small Language Model (SLM) pre-trained on FineWeb and fine-tuned on Alpaca.☆34Dec 16, 2025Updated 2 months ago
- ☆32Sep 25, 2025Updated 5 months ago
- Empowering Small VLMs to Think with Dynamic Memorization and Exploration☆15Nov 18, 2025Updated 3 months ago
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆12Feb 27, 2024Updated 2 years ago
- ☆46Apr 26, 2024Updated last year
- 🔥 Latest advances in Video Object Segmentation (VOS) – papers, datasets, and projects.☆468Feb 18, 2026Updated 2 weeks ago
- Large-Vocabulary Video Instance Segmentation dataset☆96Jul 5, 2024Updated last year
- [NeurIPS 2024] OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling.☆30Nov 13, 2025Updated 3 months ago
- A list of referring video object segmentation papers☆57Jun 6, 2025Updated 9 months ago
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding☆50Jan 14, 2025Updated last year
- This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"☆13Aug 22, 2025Updated 6 months ago