baopj / Vid-MorpLinks
☆10Updated 7 months ago
Alternatives and similar repositories for Vid-Morp
Users that are interested in Vid-Morp are comparing it to the libraries listed below
Sorting:
- [ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.☆10Updated last year
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆20Updated 5 months ago
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆36Updated 3 months ago
- MR. Video: MapReduce is the Principle for Long Video Understanding☆21Updated 2 months ago
- (NeurIPS 2023) Open-set visual object query search & localization in long-form videos☆24Updated last year
- [ICLR 2025] This repo is the official implementation of our paper "Learning Fine-Grained Representations through Textual Token Disentangl…☆12Updated 3 months ago
- Official PyTorch Code of ReKV (ICLR'25)☆35Updated 4 months ago
- This is the official implementation of ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos☆27Updated 3 weeks ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆38Updated 3 months ago
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆34Updated last month
- [CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".☆34Updated last month
- [CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…☆48Updated 10 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆42Updated last year
- CVPR2025: Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning☆33Updated 3 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆77Updated last year
- 🔎Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".☆39Updated 4 months ago
- Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models☆30Updated this week
- The official repository for ACL2025 paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆49Updated 2 months ago
- ☆19Updated 2 months ago
- [WACV 2025] Official Pytorch code for "Background-aware Moment Detection for Video Moment Retrieval"☆15Updated 4 months ago
- This is the official repository for paper: cross-modal information flow in multimodal large language models☆20Updated last month
- [CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training☆26Updated 3 months ago
- Official PyTorch code of GroundVQA (CVPR'24)☆61Updated 10 months ago
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆60Updated this week
- ☆32Updated 10 months ago
- ☆17Updated last month
- Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.☆25Updated 5 months ago
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆47Updated 6 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆80Updated 2 months ago
- This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos☆17Updated 4 months ago