HKU-MMLab / Math-VR-CodePlot-CoTLinks
Math-VR Benchmark & CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
☆41Updated 3 weeks ago
Alternatives and similar repositories for Math-VR-CodePlot-CoT
Users that are interested in Math-VR-CodePlot-CoT are comparing it to the libraries listed below
Sorting:
- [ICCV2025]Code Release of Harmonizing Visual Representations for Unified Multimodal Understanding and Generation☆177Updated 6 months ago
- Official Implementation of VideoDPO☆147Updated 5 months ago
- ☆60Updated 3 months ago
- [ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark☆132Updated 5 months ago
- ICCV2023-Diffusion-Papers☆108Updated 2 years ago
- ☆162Updated 5 months ago
- Official Implementation of Paper Transfer between Modalities with MetaQueries☆271Updated last month
- Curated list of recent visual autoregressive (VAR) modeling works☆31Updated 8 months ago
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆84Updated 9 months ago
- Official implementation of LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment.☆84Updated 6 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆127Updated 7 months ago
- [ICML2025] The code and data of Paper: Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation☆138Updated last year
- ☆132Updated last month
- [CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?☆103Updated 4 months ago
- Structured Video Comprehension of Real-World Shorts☆218Updated 2 months ago
- Official respository for ReasonGen-R1☆73Updated 5 months ago
- Unified layout planning and image generation, ICCV2025☆34Updated 7 months ago
- Empowering Unified MLLM with Multi-granular Visual Generation☆131Updated 10 months ago
- Official Implementation of ICLR'24: Kosmos-G: Generating Images in Context with Multimodal Large Language Models☆73Updated last year
- [CVPR 25] A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆49Updated 8 months ago
- WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation☆163Updated 3 weeks ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆130Updated 3 months ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆36Updated last year
- This is an early exploration to introduce Interleaving Reasoning to Text-to-image Generation field and achieve the SoTA benchmark perform…☆76Updated 2 months ago
- A survey for visual generation alignment☆98Updated 3 weeks ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆94Updated 8 months ago
- [NeurIPS 2025] Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations☆185Updated 2 months ago
- Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"☆116Updated last month
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆244Updated last year
- [CVPR 2025] T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation☆99Updated last month