Imagine While Reasoning in Space: Multimodal Visualization-of-Thought (ICML 2025)
☆72Apr 12, 2025Updated 11 months ago
Alternatives and similar repositories for MVoT
Users that are interested in MVoT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS 2025] Scaling Language-centric Omnimodal Representation Learning☆38Feb 6, 2026Updated last month
- ☆66Feb 1, 2026Updated last month
- ☆12Jan 10, 2025Updated last year
- ☆21May 28, 2025Updated 10 months ago
- ☆14Apr 20, 2025Updated 11 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A Self-Training Framework for Vision-Language Reasoning☆89Jan 23, 2025Updated last year
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆33Jul 21, 2023Updated 2 years ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning" [NeurIPS25]☆186Jun 5, 2025Updated 9 months ago
- Official repo for [AAAI 2026 Oral] "S5: Scalable Semi-Supervised Semantic Segmentation in Remote Sensing"☆33Dec 4, 2025Updated 3 months ago
- Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision☆19Apr 1, 2025Updated 11 months ago
- [Blog 1] Recording a bug of grpo_trainer in some R1 projects☆23Feb 23, 2025Updated last year
- 这个仓库包含了我在上人工智能课时完成的拼音输入法作业。☆11Feb 16, 2022Updated 4 years ago
- ☆24May 23, 2025Updated 10 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆62Jul 26, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning☆101May 20, 2025Updated 10 months ago
- ☆59Jun 20, 2024Updated last year
- Official repo for [IEEE TGRS'26] "SPEX: A Vision-Language Model for Land Cover Extraction on Spectral Remote Sensing Images"☆22Mar 16, 2026Updated 2 weeks ago
- MLR-Bench: Evaluating AI Agents on Open-Ended Machine Learning Research☆24Sep 23, 2025Updated 6 months ago
- Code for ACL 2023 Oral Paper: ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning☆12Aug 23, 2025Updated 7 months ago
- Test Demo for “HDP-Net: Haze Density Prediction Network for Nighttime Dehazing” PCM 2018☆12Sep 24, 2018Updated 7 years ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆35Jul 1, 2024Updated last year
- ☆12Jun 19, 2024Updated last year
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…☆21Oct 24, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- This repo is the official implementation of "Euclid’s Gift: Enhancing Spatial Perception and Reasoning in Vision‑Language Models via Geom…☆27Mar 15, 2026Updated 2 weeks ago
- CoCoFL: Communication- and Computation-Aware Federated Learning via Partial NN Freezing and Quantization☆13Aug 3, 2024Updated last year
- MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models☆32Jan 22, 2025Updated last year
- ☆15Sep 17, 2024Updated last year
- o1 Chain of Thought Examples☆33Oct 4, 2024Updated last year
- A fork to add multimodal model training to open-r1☆1,514Feb 8, 2025Updated last year
- Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.☆846May 14, 2025Updated 10 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆91Updated this week
- ☆35Feb 15, 2026Updated last month
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Open-Pandora: On-the-fly Control Video Generation☆35Nov 28, 2024Updated last year
- ✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning☆281May 9, 2025Updated 10 months ago
- Reproducing R1 for Code with Reliable Rewards☆12Apr 9, 2025Updated 11 months ago
- Official repo for "TiMo: Spatiotemporal Foundation Model for Satellite Image Time Series"☆28May 14, 2025Updated 10 months ago
- Image Super-Resolution Using Very Deep Residual Channel Attention Networks☆15Nov 29, 2021Updated 4 years ago
- [CVPR 2026] Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens☆256Aug 2, 2025Updated 7 months ago
- [arXiv: 2505.12307] LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?☆35Dec 1, 2025Updated 3 months ago