THUNLP-MT / MUSEGLinks
Repo for paper "MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding".
☆37Updated 6 months ago
Alternatives and similar repositories for MUSEG
Users that are interested in MUSEG are comparing it to the libraries listed below
Sorting:
- [NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding☆68Updated 2 weeks ago
- Universal Video Temporal Grounding with Generative Multi-modal Large Language Models☆42Updated last month
- TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs☆61Updated last week
- The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"☆73Updated 2 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆33Updated 6 months ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆93Updated 9 months ago
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆44Updated 6 months ago
- Official repository of NeurIPS D&B Track 2024 paper "VERIFIED: A Video Corpus Moment Retrieval Benchmark for Fine-Grained Video Understan…☆39Updated 11 months ago
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆49Updated 11 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆65Updated 6 months ago
- SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability☆15Updated 7 months ago
- ☆37Updated 6 months ago
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆59Updated 5 months ago
- Official code for MotionBench (CVPR 2025)☆61Updated 9 months ago
- [CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆131Updated 5 months ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆88Updated last year
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆133Updated 4 months ago
- This is a PyTorch implementation of 3DRefTR proposed by our paper "A Unified Framework for 3D Point Cloud Visual Grounding"☆24Updated 2 years ago
- TStar is a unified temporal search framework for long-form video question answering☆80Updated 3 months ago
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆59Updated 6 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos☆143Updated last year
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆113Updated 5 months ago
- The repository of VG-Refiner paper☆16Updated 3 weeks ago
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆24Updated 6 months ago
- [NeurIPS 2024 Oral] RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation☆18Updated last year
- [AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…☆63Updated 2 months ago
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆37Updated 4 months ago
- Official PyTorch Code of ReKV (ICLR'25)☆83Updated last month
- [ICCV'25] Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness☆63Updated 5 months ago
- SpaceR: The first MLLM empowered by SG-RLVR for video spatial reasoning☆101Updated 5 months ago