hanghuacs / FineCaptionLinks
β37Updated 4 months ago
Alternatives and similar repositories for FineCaption
Users that are interested in FineCaption are comparing it to the libraries listed below
Sorting:
- [CVPR 2025 π₯]A Large Multimodal Model for Pixel-Level Visual Grounding in Videosβ87Updated 6 months ago
- Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal Promptingβ56Updated 3 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videosβ138Updated 10 months ago
- Code for the paper "Exploring Pre-trained Text-to-Video Diffusion Models for Referring Video Object Segmentation", ECCV 2024β43Updated last year
- [ECCV 2024] ControlCap: Controllable Region-level Captioningβ79Updated last year
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"β51Updated 2 weeks ago
- β32Updated last year
- [ICCV 2025] Official implementation of "InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models"β48Updated 8 months ago
- ICML2025β59Updated 2 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuningβ122Updated 6 months ago
- Transactions on Multimedia (TMM25)β16Updated 6 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generationβ227Updated 2 months ago
- [CVPR 2025] LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understandingβ73Updated 3 months ago
- β40Updated 3 months ago
- [ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM