[ICLR 2025] Mathematical Visual Instruction Tuning for Multi-modal Large Language Models
☆152Dec 5, 2024Updated last year
Alternatives and similar repositories for MAVIS
Users that are interested in MAVIS are comparing it to the libraries listed below
Sorting:
- [ECCV 2024] Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?☆176Apr 28, 2025Updated 10 months ago
- MultiMath: Bridging Visual and Mathematical Reasoning for Large Language Models☆32Jan 22, 2025Updated last year
- Code for Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models☆92Jun 28, 2024Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- [NeurIPS 2024] Needle In A Multimodal Haystack (MM-NIAH): A comprehensive benchmark designed to systematically evaluate the capability of…☆123Nov 25, 2024Updated last year
- [MM 2025] CMM-Math: A Chinese Multimodal Math Dataset To Evaluate and Enhance the Mathematics Reasoning of Large Multimodal Models☆53Oct 20, 2024Updated last year
- Mavlink based attacker for GPS,Actuator or other sensors. SITL Environment is based on PX4,Gazebo,ROS And QGC.☆19Aug 16, 2025Updated 6 months ago
- ☆155Oct 31, 2024Updated last year
- Official code implementation of Slow Perception:Let's Perceive Geometric Figures Step-by-step☆159Jul 28, 2025Updated 7 months ago
- Official github repo of G-LLaVA☆148Feb 20, 2025Updated last year
- [NAACL 2024] MMC: Advancing Multimodal Chart Understanding with LLM Instruction Tuning☆95Jan 7, 2025Updated last year
- Official repository for "TrustGeoGen: Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving"☆23Sep 1, 2025Updated 6 months ago
- ☆17Jan 9, 2025Updated last year
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization☆583Jun 7, 2024Updated last year
- [NeurIPS 2024] MATH-Vision dataset and code to measure multimodal mathematical reasoning capabilities.☆129May 16, 2025Updated 9 months ago
- Reverse Chain-of-Thought Problem Generation for Geometric Reasoning in Large Multimodal Models☆185Nov 4, 2024Updated last year
- ☆43Sep 12, 2024Updated last year
- [EMNLP 2024] mDPO: Conditional Preference Optimization for Multimodal Large Language Models.☆86Nov 10, 2024Updated last year
- One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks☆3,750Updated this week
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆80Jun 17, 2024Updated last year
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆28Jul 9, 2025Updated 7 months ago
- Paper collections of multi-modal LLM for Math/STEM/Code.☆136Nov 17, 2025Updated 3 months ago
- ☆18May 14, 2024Updated last year
- MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts☆355Sep 29, 2025Updated 5 months ago
- Deep Reinforcement Learning Algorithms for solving Atari 2600 Games☆143Mar 23, 2023Updated 2 years ago
- Collaborative caching for HTTP video streaming☆38Aug 13, 2023Updated 2 years ago
- A Self-Training Framework for Vision-Language Reasoning☆88Jan 23, 2025Updated last year
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025☆276May 26, 2025Updated 9 months ago
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆25Dec 21, 2025Updated 2 months ago
- ☆47Nov 8, 2024Updated last year
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought …☆426Dec 22, 2024Updated last year
- Official Repo for the paper: VCR: Visual Caption Restoration. Check arxiv.org/pdf/2406.06462 for details.☆32Feb 26, 2025Updated last year
- ☆42Dec 21, 2023Updated 2 years ago
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆21May 28, 2024Updated last year
- The Most Faithful Implementation of Segment Anything (SAM) in 3D☆353Sep 11, 2024Updated last year
- HACAN: Hybrid Attention-Driven Cross-Layer Alignment Network for Image-Text Retrieval☆79Apr 30, 2025Updated 10 months ago
- When do we not need larger vision models?☆413Feb 8, 2025Updated last year
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆3,845Updated this week
- NRF905 full-featured driver library for general-purpose MCU and Linux.☆76Jan 6, 2026Updated last month