☆37Jun 20, 2025Updated 8 months ago
Alternatives and similar repositories for FineCaption
Users that are interested in FineCaption are comparing it to the libraries listed below
Sorting:
- ☆17Jun 20, 2025Updated 8 months ago
- ☆26Jan 4, 2025Updated last year
- [AAAI 2025] Empowering LLMs with Pseudo-Untrimmed Videos for Audio-Visual Temporal Understanding☆34Mar 21, 2025Updated 11 months ago
- UGround: Towards Unified Visual Grounding with Unrolled Transformers☆21Feb 15, 2026Updated 2 weeks ago
- ☆12Jan 17, 2024Updated 2 years ago
- [🏆 IJCV 2025 & ACCV 2024 Best Paper Honorable Mention] Official pytorch implementation of the paper "High-Quality Visually-Guided Sound …☆28Nov 1, 2025Updated 4 months ago
- ☆18Apr 20, 2025Updated 10 months ago
- Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]☆14Jul 11, 2024Updated last year
- [AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…☆64Jan 27, 2026Updated last month
- LEO: A powerful Hybrid Multimodal LLM☆19Jan 18, 2025Updated last year
- WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning (CVPR 2026)☆55Dec 30, 2025Updated 2 months ago
- (ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation☆48Jul 18, 2024Updated last year
- This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos☆19Mar 3, 2025Updated last year
- This is the official implementation of "GvSeg: General and Task-Oriented Video Segmentation" (Accepted at ECCV 2024).☆18Jul 15, 2024Updated last year
- ☆20May 11, 2025Updated 9 months ago
- Code release for the CVPR'23 paper titled "PartDistillation Learning part from Instance Segmentation"☆60Dec 17, 2023Updated 2 years ago
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆62Nov 7, 2024Updated last year
- [ICCV 2025] GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding☆73Jun 26, 2025Updated 8 months ago
- HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model☆87Jul 17, 2025Updated 7 months ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆41Aug 4, 2025Updated 7 months ago
- Code for the VOST dataset☆26Oct 1, 2023Updated 2 years ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆181Oct 14, 2024Updated last year
- Offical implemention of the paper DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction☆29May 26, 2024Updated last year
- Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM☆79Apr 19, 2025Updated 10 months ago
- Code for the paper "Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundatio…☆28Nov 8, 2023Updated 2 years ago
- [ICCV 2023] BoxSnake official repository.☆65May 28, 2024Updated last year
- [TCSVT] state-of-the-art open vocabulary detector on COCO/LVIS/V3Det☆32Jun 3, 2025Updated 9 months ago
- ☆33Sep 27, 2024Updated last year
- [ICCV 2023] HiLo: Exploiting High Low Frequency Relations for Unbiased Panoptic Scene Graph Generation☆37Jan 25, 2024Updated 2 years ago
- Code for ChatRex: Taming Multimodal LLM for Joint Perception and Understanding☆211Oct 15, 2025Updated 4 months ago
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding☆79Oct 6, 2023Updated 2 years ago
- This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…☆72Jun 3, 2024Updated last year
- The official code of Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval (AAAI2024)☆32Mar 29, 2024Updated last year
- [AAAI2025 selected as oral] - Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints☆44Jul 2, 2025Updated 8 months ago
- Associate Everything Detected: Facilitating Tracking-by-Detection to the Unknown☆41Feb 22, 2026Updated last week
- The repository of VG-Refiner paper☆17Dec 9, 2025Updated 2 months ago
- This module includes functions that can be used to simulate mechanochemical phenomena.☆11Nov 16, 2021Updated 4 years ago
- All things manipulating, quantifying, and visualizing geochemical data☆13Jan 19, 2024Updated 2 years ago