FrankYang-17 / MavorsLinks
β14Updated 4 months ago
Alternatives and similar repositories for Mavors
Users that are interested in Mavors are comparing it to the libraries listed below
Sorting:
- π₯π₯π₯ Latest Papers, Codes and Datasets on Video-LMM Post-Trainingβ52Updated this week
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?β73Updated 2 months ago
- β60Updated last month
- ICML2025β58Updated last month
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasingββ57Updated 3 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoningβ122Updated last month
- [ICML 2025] Streamline Without Sacrifice - Squeeze out Computation Redundancy in LMMβ19Updated 4 months ago
- On Path to Multimodal Generalist: General-Level and General-Benchβ19Updated 2 months ago
- β26Updated 3 months ago
- Official implement of MIA-DPOβ66Updated 8 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmarkβ128Updated 4 months ago
- β14Updated 7 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMsβ30Updated last week
- [NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tunβ¦β37Updated 7 months ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025β76Updated 6 months ago
- [ICCV 2025] Dynamic-VLMβ25Updated 9 months ago
- Text-Only Data Synthesis for Vision Language Model Trainingβ22Updated 4 months ago
- TStar is a unified temporal search framework for long-form video question answeringβ68Updated last month
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoTβ79Updated 2 months ago
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of uβ¦β22Updated 4 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modelingβ39Updated 7 months ago
- β45Updated last week
- β129Updated 3 months ago
- Test-time Scaling for VAR modelsβ24Updated 3 weeks ago
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?β88Updated 2 months ago
- A Comprehensive Dataset for Advanced Image Generation and Editing}β17Updated last week
- Official InfiniBench: A Benchmark for Large Multi-Modal Models in Long-Form Movies and TV Showsβ17Updated last month
- Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Visionβ146Updated 2 weeks ago
- Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?β36Updated 3 months ago
- Quick Long Video Understandingβ64Updated 3 months ago