[ICLR 2025] Mathematical Visual Instruction Tuning for Multi-modal Large Language Models
☆153Dec 5, 2024Updated last year
Alternatives and similar repositories for MAVIS
Users that are interested in MAVIS are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?☆177Apr 28, 2025Updated 10 months ago
- MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models☆32Jan 22, 2025Updated last year
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆92Jun 28, 2024Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- Official repository for "TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving"☆23Sep 1, 2025Updated 6 months ago
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆54Oct 20, 2024Updated last year
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆25Dec 21, 2025Updated 3 months ago
- Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models☆185Nov 4, 2024Updated last year
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆159Jul 28, 2025Updated 7 months ago
- The Most Faithful Implementation of Segment Anything (SAM) in 3D☆357Sep 11, 2024Updated last year
- Paper collections of multi-modal LLM for Math/STEM/Code.☆137Nov 17, 2025Updated 4 months ago
- ☆18May 14, 2024Updated last year
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆124Nov 25, 2024Updated last year
- Official github repo of G-LLaVA☆148Feb 20, 2025Updated last year
- ☆157Oct 31, 2024Updated last year
- [NeurIPS 2025] MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning☆102Sep 19, 2025Updated 6 months ago
- Mavlink based attacker for GPS,Actuator or other sensors. SITL Environment is based on PX4,Gazebo,ROS And QGC.☆19Aug 16, 2025Updated 7 months ago
- A Self-Training Framework for Vision-Language Reasoning☆88Jan 23, 2025Updated last year
- MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts☆355Sep 29, 2025Updated 5 months ago
- ☆17Jan 9, 2025Updated last year
- One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks☆3,916Mar 15, 2026Updated last week
- 此工程为唯杰地图 VJMAP3D 示例的所有源代码。唯杰地图3D VJMAP3D是一款基于threejs开发的三维可视化引擎框架。通过VJMAP3D提供的丰富的功能,可以在浏览器中创建出绚丽的3D可视化应用。 该框架既可做为一个单独的3D引擎用于数据可视化、产品展示、数字…☆44Mar 11, 2026Updated last week
- Deep Reinforcement Learning Algorithms for solving Atari 2600 Games☆143Mar 23, 2023Updated 3 years ago
- [ICLR'25] Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training☆47Jan 25, 2025Updated last year
- Collaborative caching for HTTP video streaming☆38Aug 13, 2023Updated 2 years ago
- 「ECCV 2024」 PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation☆21Jul 2, 2024Updated last year
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆82Jun 17, 2024Updated last year
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆21May 28, 2024Updated last year
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆439Dec 22, 2024Updated last year
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆131May 16, 2025Updated 10 months ago
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization☆584Jun 7, 2024Updated last year
- ☆43Dec 21, 2023Updated 2 years ago
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆3,920Updated this week
- The implement of geometric solver PGPSNet☆30Jan 30, 2025Updated last year
- OGtwelve's util pack: contains many different util might used in real life develop situation☆111Dec 30, 2023Updated 2 years ago
- Cambrian-1 is a family of multimodal LLMs with a vision-centric design.☆1,995Nov 7, 2025Updated 4 months ago
- ☆153Jul 28, 2022Updated 3 years ago
- A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models☆28Nov 25, 2024Updated last year
- Welcome to the 'Open-Alteryx-Macro' project. This project is aimed at providing an open-source solution for managing and updating Alteryx…☆156May 25, 2024Updated last year