ning-mz / SCA-GPSLinks
Code of ACM MM 2023 Paper: A Symbolic Characters Aware Model for Solving Geometry Problems
☆16Updated 2 years ago
Alternatives and similar repositories for SCA-GPS
Users that are interested in SCA-GPS are comparing it to the libraries listed below
Sorting:
- ☆88Updated last year
- ☆40Updated last year
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆106Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆91Updated last year
- [AAAI 2025]Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning☆42Updated 9 months ago
- Offical Repository of "AtomThink: Multimodal Slow Thinking with Atomic Step Reasoning"☆62Updated 2 months ago
- Github repository for "Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas" (ICML 2025)☆68Updated 9 months ago
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆92Updated last year
- ☆133Updated last year
- Official implementation of "Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data" (ICLR 2024)☆34Updated last year
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆57Updated last year
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆155Updated last year
- MAT: Multi-modal Agent Tuning 🔥 ICLR 2025 (Spotlight)☆84Updated last month
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆50Updated 6 months ago
- [NeurIPS 2025] Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models☆53Updated 4 months ago
- Official Code of IdealGPT☆35Updated 2 years ago
- An benchmark for evaluating the capabilities of large vision-language models (LVLMs)☆46Updated 2 years ago
- Visualizing the attention of vision-language models☆279Updated 11 months ago
- Code for our Paper "All in an Aggregated Image for In-Image Learning"☆29Updated last year
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆128Updated 8 months ago
- A Self-Training Framework for Vision-Language Reasoning☆88Updated last year
- MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models☆32Updated last year
- MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts☆354Updated 4 months ago
- A RLHF Infrastructure for Vision-Language Models☆196Updated last year
- Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'☆18Updated last year
- [ICLR 2025] ChartMimic: Evaluating LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation☆131Updated last month
- ☆101Updated 2 years ago
- ☆51Updated last year
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆47Updated last year
- [NeurIPS 25] The official implementation of SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning☆25Updated 4 months ago