Alpha-Innovator / TrustGeoGenLinks
☆19Updated 2 months ago
Alternatives and similar repositories for TrustGeoGen
Users that are interested in TrustGeoGen are comparing it to the libraries listed below
Sorting:
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training☆47Updated 3 months ago
- The proposed simulated dataset consisting of 9,536 charts and associated data annotations in CSV format.☆25Updated last year
- [ICCV 2025] Dynamic-VLM☆20Updated 6 months ago
- Official Repository: A Comprehensive Benchmark for Logical Reasoning in MLLMs☆38Updated last week
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆41Updated 2 weeks ago
- ☆62Updated last month
- ☆37Updated last month
- (ACL-2025 main conference) Dolphin: Moving Towards Closed-loop Auto-research through Thinking, Practice, and Feedback☆18Updated 3 weeks ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Updated last year
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆37Updated 5 months ago
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆55Updated 7 months ago
- [ArXiv] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding☆50Updated 6 months ago
- ☆80Updated 5 months ago
- ☆115Updated 10 months ago
- ☆42Updated 7 months ago
- On Path to Multimodal Generalist: General-Level and General-Bench☆15Updated last month
- Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"☆45Updated this week
- Official implement of MIA-DPO☆58Updated 5 months ago
- The official code of "VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning"☆119Updated 3 weeks ago
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆15Updated 2 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆92Updated 2 months ago
- A Self-Training Framework for Vision-Language Reasoning☆80Updated 5 months ago
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆62Updated 2 weeks ago
- ☆18Updated last month
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆31Updated 2 months ago
- ☆44Updated 5 months ago
- (ArXiv25) Vision Matters: Simple Visual Perturbations Can Boost Multimodal Math Reasoning☆37Updated 2 weeks ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆28Updated 2 months ago
- Official code of *Virgo: A Preliminary Exploration on Reproducing o1-like MLLM*☆104Updated last month
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆66Updated 11 months ago