UMass-Embodied-AGI / MirageLinks
Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)
☆58Updated this week
Alternatives and similar repositories for Mirage
Users that are interested in Mirage are comparing it to the libraries listed below
Sorting:
- ☆44Updated 5 months ago
- ☆37Updated last month
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆64Updated 3 weeks ago
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆28Updated 9 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆70Updated last week
- ☆84Updated this week
- [NeurIPS 2024] Official Repository of Multi-Object Hallucination in Vision-Language Models☆29Updated 7 months ago
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆15Updated 2 months ago
- ☆38Updated last week
- ☆21Updated 7 months ago
- Multimodal RewardBench☆41Updated 4 months ago
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆30Updated 2 months ago
- [ICML 2025] VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models☆27Updated 2 weeks ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆52Updated 3 weeks ago
- ☆49Updated 2 months ago
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆64Updated last month
- [CVPR2025] A benchmark for evaluating video generative models in generating short stories☆15Updated last month
- ☆37Updated 2 weeks ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆37Updated 5 months ago
- ☆38Updated this week
- ☆35Updated 2 weeks ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆61Updated 2 weeks ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆68Updated last year
- 🔥 Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resamplin…☆35Updated last week
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆46Updated 5 months ago
- Official Repository of Personalized Visual Instruct Tuning☆29Updated 3 months ago
- [NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"☆35Updated 7 months ago
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆22Updated 10 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆64Updated 11 months ago
- ☆32Updated 5 months ago